Thursday, August 04, 2011

Server scripting with NodeJs: Tainting LDAP data

This is a real world example of using nodejs to do server side scripting. It shows how to deal with command line arguments, invoke shell commands and parse responses, interact with mysql and how to use the library (when possible and makes sense) in a synchronous way.

Node (also knows and nodejs) is a javascript library that allows to run javascript code out of the browser. It favors event programming which can be difficult for backend developers used to languages like Java. At the same time it is straightforward for front end developers which are already used to javascript callback functions necessary for UI programming.

If you havent install it yet go ahead and do it now:

  1. I have tested all this with node-v0.4.10 as you see below.
    cd ~/Downloads
    wget http://nodejs.org/dist/node-v0.4.10.tar.gz
    tar -zxvf node-v0.4.10.tar.gz
    cd node-v0.4.10
    ./configure
    make
    sudo make install
    node --version
    
  2. Install npm (package manager for nodejs) in case you need extra modules (and you will when you get serious about nodejs development)
    curl http://npmjs.org/install.sh | sudo sh
    
  3. If you use MySQL here is how to install one library that just works (whole API: https://github.com/felixge/node-mysql)
    npm install mysql
    
  4. You should of course test with the simplest possible 'Hello World' script
    $ vi hello.js
    console.log('Hello World');
    $ node hello.js
    

The Node project is making a big effort to enforce the use of non blocking code (code that will be triggered but will not block the current thread). That is the reason why most of the code you will see is written with nested callback functions.

Event programming is good paradigm for certain problems. Responsiveness of a program can be improved a lot without having to deal with blocking in multithreading programming. Those of us that have worked with threads know how hard it can be. I am not nodejs is better than java or any other language to deal with multiprocessing. It is just a different paradigm and I would appreciate if you do not start ranting against or in favor of any programming language in this post.

Here is one extract from the process api: http://nodejs.org/docs/v0.3.1/api/process.html
process.argv.forEach(function (val, index, array) {
  console.log(index + ': ' + val);
});
console.log

The previous code is non blocking, so if you need to work with the arguments you will need to work inside the anonymous function. Look at the code below. After running it you will see the asynchronous code does not run before the console logs the message:
var asyncArg = 'Async Arguments:';
var syncArg = 'Sync Arguments:';

var argv = process.argv;
argv.forEach(function (val, index, array) {
 setTimeout(function() {asyncArg += ' ' + val; console.log( asyncArg ); }, 0);
 
});
console.log( asyncArg );

for( argIndex in argv ) {
 syncArg += ' ' + argv[argIndex];
}
console.log( syncArg );

So probably to work with command line arguments you are better off the asynchronous flow. One might expect that there is always a way to go synchronous with nodejs but that is not the case. As I said the nodejs library pushes for non blocking === asynchronous code. Look at the below code which comments should be self explanatory:
var sys = require('sys')
var exec = require('child_process').exec;

var child = exec("ldapsearch -x -v -H 'ldap://nestorurquiza:10389' -D 'uid=admin,ou=system' -w 'secret' -b 'o=nestorurquiza'", function (error, stdout, stderr) {
  console.log(stdout); //Will print the output of the command
  if (error !== null) {
    console.log('exec error: ' + error);
  }
});
console.log(child.stdout); //Will not print the output of the command but rather the object prototype

Here is a script that clones the data from one LDAP Server (tested with ApacheDS) and imports it into a second LDAP server. The script taints the emails to ensure we do not send messages to real production users while testing. It excludes certain domains and it shows how to interact with mysql to pull a white list of emails for which no tainting should be done. It also changes the passwords to a well known test password so the team can use it to debug what happens when different users interact with the application. BTW this task is better to be done from Talend but I needed a real problem to demonstrate nodejs is ready to work as server side scripting while I was figuring out how to make it happen from Talend:

#!/usr/local/bin/node
/*
** WARNING: This program will wipe out the specified BASE_DN from the target LDAP Server
**
** taintLdap.js A nodejs script to taint data from ldap. Use it to change passwords for all users to a known value and to change their emails to avoid sending messages from testing environment to real users
**
** Example: node taintLdap.js 'ldap://jnestorurquiza:10389' 'uid=admin,ou=system' 'secret' 'ldap://localhost:10389' 'uid=admin,ou=system' 'secret'
**
** @Author: Nestor Urquiza
** @Date: 08/02/2011
**
*/

/*
** Imports
*/
var sys = require('sys')
var exec = require('child_process').exec;
var Client = require('mysql').Client;
var client = new Client();

/*
** Constants
*/
var BASE_DN = "o=nestorurquiza";
var EXCLUSION_DOMAINS = ['nestorurquiza.com','nestoru.com'];
var APPEND_DOMAIN = 'nestorurquiza.com'
var COMMON_PASSWORD = 'e1NIQX1JcnFMUVdMT1o3ZXF0WHRBdUlFSFRlUnZkRFk9' //Testtest1 after SHA1. To generate a different password use: echo -n "mypassword" | openssl dgst -sha1

/*
** Arguments
*/
var argv = process.argv;
if( argv.length != 8 ) {
 usage();
 process.exit(1);
}
var fromUrl = argv[2];
var fromUser = argv[3];
var fromPasword = argv[4];
var toUrl = argv[5];
var toUser = argv[6];
var toPasword = argv[7];

client.user = 'root';
client.password = 'root';
client.host = 'localhost';
client.port = '3306';
client.database = 'nestorurquiza'

/*
** Main
*/
//console.log('Cloning and tainting from ' + fromUrl + ' to ' + toUrl);
var ldif;
var authorizedEmails = new Array();
var pattern = /[^\s=,]*@[^\s=,]*/g
var child = exec("ldapsearch -x -v -H '" + fromUrl + "' -D '" + fromUser + "' -w '" + fromPasword + "' -b '" + BASE_DN + "'", function (error, stdout, stderr) {
  if (error) {
    throw error;
  }
  ldif = stdout;
  client.connect();
  var authorizedEmailQuery = client.query(
  'SELECT name FROM authorized_test_email',
  function (error, results, fields) {
    if (error) {
      throw error;
    }
    for (var resultIndex in results){
      var result = results[resultIndex];
      authorizedEmails[resultIndex] = result.name;
    }
    client.end();
    //console.log('****************'  + authorizedEmails);
    ldif = taintLdifEmail();
 
 child = exec("ldapdelete -r -x -H '" + toUrl + "' -D '" + toUser + "' -w '" + toPasword + "' '" + BASE_DN + "'", function (error, stdout, stderr) {
   if (error) {
        //throw error;
        console.log("WARNING: Could not delete " + BASE_DN);
      }
      var command = "echo '" + ldif + "' | ldapmodify -x -c -a -H '" + toUrl + "' -D '" + toUser + "' -w '" + toPasword + "'";
      //console.log(command);
      child = exec(command, function (error, stdout, stderr) {
     
  if (error) {
   throw error;
  }
   });
    });   
 //ldapmodify -x -c -a -H ldap://localhost:10389 -D "uid=admin,ou=system" -w 'secret' < ~/Downloads/taintedLdap.ldif
  }
);
});

/*
** Functions
*/
function taintLdifEmail() {
 var matches = ldif.match(pattern);

 for ( var matchIndex in matches ) {
  var taint = true;
  var match = matches[matchIndex];
  for( var exclusionDomainIndex in EXCLUSION_DOMAINS ) {
   if( match.indexOf(EXCLUSION_DOMAINS[exclusionDomainIndex]) >= 0 ) {
    taint = false;
    break;
   }
  }
  if( !taint ) {
   continue;
  }
  for ( var authorizedEmailIndex in authorizedEmails ) {
   var authorizedEmail = authorizedEmails[authorizedEmailIndex];
   if( match == authorizedEmail ) {
    taint = false;
    continue;
   }
  }
  if( taint ) {
   var replacement = match.replace('.', '') + APPEND_DOMAIN;
   //console.log( match + " >>> " + replacement );
   ldif = ldif.replace(match, replacement);
  }
 }
 ldif = ldif.replace(/userPassword.*/g, 'userPassword:: ' + COMMON_PASSWORD);
 return ldif;
}

function usage() {
 console.log("Usage: " + "./taintLdap.js <fromUrl> <fromUser> <fromPasword> <toUrl> <toUser> <toPasword>");
}

Why I am trying to use nodejs if I already have bash, awk, perl, python, ruby and what not? I am building a team with strong separation of concerns. While we know we cannot be as good as the guy that is spending 100% of the time in just writing SQL stored procedures the whole team could cover for some days to be able to compensate vacation time for example, so yes SQL is a mandatory skill. Javascript is a mandatory skill as well and if I can script with it I could have some of those scripting needs done by anybody in the team as well. I am just trying to keep really low the amount of technologies and languages we use.

Isn't it better to use RhinoJs? Probably yes, but I am tempted to use something out of the JVM that runs faster and consume less resources. I have recently decomisioned a whole CLI project based in Java just because it was really resource intensive. I have favored the use of Controllers in our Business Hub which are called from simple CURL or WGET statements. I am not claiming RhinoJs is unnacceptable slow nor that NodeJs is better. I see value in both of them.

Why I am considering scripting after all if I promote the idea of a Business Hub? There are cases in which definitely using Unix Power Tools, existing CLIs etc do the job quickly and more reliable.

Do I think NodeJs is a better answer for Server side logic than Java? At the moment I am happy with plain java for my backend. The amount of existing code available for free is amazing. If that will be the case in the future it will depend on the open source community. At least for my current project I stick to Java.

No comments:

Followers