Below Checklist
ensures the Job Performance and Best Practices used while design and coding of
DataStage Job
A.
Partitioning:
Ø Join, Aggregator, Sort: Hash
Partitioning
Ø Lookup:
ü If the data from main link is already
in Hash partitioning (on the lookup key) then instead of Entire Partitioning on
reference link, use Hash Partitioning on Reference link on look up key. It will
be faster than Entire Partitioning.
ü If the data from main link is not
hash partition on look up key/ having any partition then use Entire
Partitioning on reference link and keep the Main Partition as it is.
Ø Aggregator: If Data have possible grouping
combination are less than 1000 then use option Method = Hash (stage itself
enough capable to sort this amount of data) and the possible grouping
combination are more than 1000 then use option Method = Hash (stage requires
Partition sorted data)
Example: If Data is
group on 3 keys and there are 500 rows in the file then possible grouping
combination is 3*500 = 1500 (this needs Method as Hash)
Ø Always prefer to use same partition
instead of Auto. Same Partition has better performance than Auto.
Ø Try to avoid partition the data unless
it is necessary
B.
Sorting: Use in-line sorting instead of Sort
stage unless data needs other functionality of sort stage like Key change,
remove duplicates, etc.
C. Transformer stage is heavy stage in Datastage
which requires C++ compiler to compile the job. Use Modify, Column Generator,
Row Generator, Filer stages instead of using Transformer stage for respective
stages purposes.
D. As Dataset retain the Partition and sorting,
use this feature and do not Partition and Sort again if it is used in another
job.
E.
In Sort Stage, keep Stable Sort = False unless you
have requirement to keep the non key attribute untouched.
F.
Use Datasets to flow the data from one Job to
another as Dataset is parallel stage, more compatible with Datastage and it’s
also retains the Partition and sorted data to the next job.
G.
Use Database Connector stages rather than
Enterprise Stage as Connector stages are more efficient and logs created by
Connector stages are more meaningful.
H.
Create Shared Container if multiple job has to be
performed same activity like.. Reject Handler, Log Handler
I.
Give a meaningful Name to Stages and Links and try
to design the job design as simple as possible.
No comments:
Post a Comment