DataStage: DataStage Developer Checklist

Below Checklist ensures the Job Performance and Best Practices used while design and coding of DataStage Job

A. Partitioning:

Ø Join, Aggregator, Sort: Hash Partitioning

Ø Lookup:

ü If the data from main link is already in Hash partitioning (on the lookup key) then instead of Entire Partitioning on reference link, use Hash Partitioning on Reference link on look up key. It will be faster than Entire Partitioning.

ü If the data from main link is not hash partition on look up key/ having any partition then use Entire Partitioning on reference link and keep the Main Partition as it is.

Ø Aggregator: If Data have possible grouping combination are less than 1000 then use option Method = Hash (stage itself enough capable to sort this amount of data) and the possible grouping combination are more than 1000 then use option Method = Hash (stage requires Partition sorted data)

Example: If Data is group on 3 keys and there are 500 rows in the file then possible grouping combination is 3*500 = 1500 (this needs Method as Hash)

Ø Always prefer to use same partition instead of Auto. Same Partition has better performance than Auto.

Ø Try to avoid partition the data unless it is necessary

B. Sorting: Use in-line sorting instead of Sort stage unless data needs other functionality of sort stage like Key change, remove duplicates, etc.

C. Transformer stage is heavy stage in Datastage which requires C++ compiler to compile the job. Use Modify, Column Generator, Row Generator, Filer stages instead of using Transformer stage for respective stages purposes.

D. As Dataset retain the Partition and sorting, use this feature and do not Partition and Sort again if it is used in another job.

E. In Sort Stage, keep Stable Sort = False unless you have requirement to keep the non key attribute untouched.

F. Use Datasets to flow the data from one Job to another as Dataset is parallel stage, more compatible with Datastage and it’s also retains the Partition and sorted data to the next job.

G. Use Database Connector stages rather than Enterprise Stage as Connector stages are more efficient and logs created by Connector stages are more meaningful.

H. Create Shared Container if multiple job has to be performed same activity like.. Reject Handler, Log Handler

I. Give a meaningful Name to Stages and Links and try to design the job design as simple as possible.

DataStage

DataStage Developer Checklist

No comments:

Post a Comment