Before we begin
Datastage Job is an executable object which performs ETL (Extract – Transform – Load) operations. The job contains various stages interconnected by links. The links acts a a pipe which carries data from one stage to another.
Designing a simple Job
lets start by designing a simple job which has the following requirement. we have a file in which we has the mark list as follows,
NAME | TEST1 | TEST2 | TEST3 | TEST4 |
MANO | 89 | 87 | 68 | 77 |
ROHIT | 76 | 78 | 67 | 90 |
VINO | 88 | 99 | 76 | 89 |
TARUN | 65 | 76 | 98 | 78 |
This is the input file. This file has just the marks, we need to produce a more detailed report which has the following information.
NAME | PERCENTAGE |
So we can define the mapping as follows,
INPUT FILE COLUMN | OUTPUT FILE COLUMN | COMMENTS |
NAME | NAME | (Direct Mapping) |
TEST1 | (Not Mapped) | |
TEST2 | (Not Mapped) | |
TEST3 | (Not Mapped) | |
TEST4 | (Not Mapped) | |
PERCENTAGE | (TEST1 + TEST2 + TEST3 + TEST4) |
So the above file(typically an excel file) which holds the information how the input column is mapped with the output column is called as mapping document. This can be maintained as a version controlled file so that we can track the change in the job by tracking in this document.
Now we have all the necessary things to start designing a job. so we can represent the logic as follows,
Now let us see the components required for designing this job.
Reading and Writing File – Need Sequential File Stage
Selecting the Required Columns – Need Transformer Stage
Performing the Calculation – Need Transformer Stage
Let us look into each components,