Parallel Routines in Datastage

I was trying to do some string manipulation things in a Job in Datastage. Oops i forgot to mention the tool, its Datastage 7.5.1.

As we knows it supports basic functions and few stages where we can do data manipulation and other stuffs. Still we cannot say that we can achieve all manipulations using those simple functions. For example in some cases we may need to iterate through the input string and find some pattern . In those times we can extend the functionality of CPP in Datastage. Writing a routine in CPP and linking it to our datastage project is really simple task as follows,

  1. Write CPP code
  2. Compiling with the required flags.
  3. Put the output file in a shared dir.
  4. Link it in the datastage.
  5. Use it in a transformer like other functions.

Writing CPP code :

We will write a function to add two numbers. May be this requirement sounds crazy but let take it for simplcity.

#include <string>
#include <iostream.h>

using namespace std;

int addNumber(int a,int b)
{
     return a+b;
}

Make sure that your cpp file is not having main function as routines are not supposed to have that.

Compiling & Linking

  • Login to Datastage Administrator
  • Goto Projects Tab -> (Select your Project) Properties -> General Tab -> Environment -> Compiler node in Parallel Root Node
  • Copy the values of APT_COMPILER and APT_COMPILEOPT and from command like follows,

                     APT_COMPILER APT_COMPILEOPT AddNumber.cpp
                    (Ex) /usr/bin/CC -o -i <iostream.h> addNumber.cpp

  • Now after running the above command you will have the output file in the same path.
  • it will be addNumber.o. Now copy this to some shared directory.
  • Now open Datastage Manager and Right Click Routines tree node and select New Parallel Routine.
  • Then Give any Name for your routine say here AddNumber, Select type as external function, external subroutine name as the function name we need to access, so here addNumber. Select proper return type and also provide the complete path of the .o file, say /usr/home/AddNumber.o
  • Now go to the arguments tab and add two arguments as Number1 & Number 2 also specify correct datatype otherwise it will fail.
  • now you are all done. Goto your job open any transformer and in any expression just select the ellipsis button […] you will get the list and there select routine. There you will get our new routine listed.
  • Select it and give the proper argument. That’s it.

But remember few points while writing the function i.e the cpp code,

  • Datastage cannot accept return type as string, so we need to design our function to return char* instead.
  • The same applies to input arguments too. So our function can accept char* only not string. But later in the cpp code  we can change it to string.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.