Difference between pipelining and parallel processing in datastage
Answers
Answered by
1
I always reach for the analogy when asked questions like this.
Imagine a room full of people stuffing envelopes for a business. Each person does the entire task of assembling the papers in order, folding them, placing them in the envelope, sealing the envelope, attaching postage, and attaching the address label. That’s parallelism. It scales very well, notice; you can imagine employing thousands of people to do the job if you had a huge number of things to mail out.
Now imagine that you organize the envelope-stuffers as an assembly line. One person does nothing but assemble the papers in order and handle them to the next person in line, who does nothing but fold the papers and hand them to the next person, and so on. This is efficient because each person becomes highly proficient at the single repetitive task they specialize in… but notice that it does not scale. In this case, the task is broken into six pipeline stages, so the pipeline of envelope-stuffers has limited concurrency. If the task were something more complicated, like building a car, then obviously you could break the task into many more stages and use more people in the assembly line.
The other gotcha is that it is tricky to make sure each stage of the pipeline takes the same amount of time. Maybe sealing the envelope only takes two seconds, but assembling the papers takes twenty seconds. Then you have a pipeline bottleneck.
However, pipelines have a significant advantage over the independent-worker type of parallelism: they manage the flow of material (or in the case of a computer, data). If every worker has to have their own supply of envelopes and address labels, then either they have to go replenish when they run out, or additional workers are needed for the job of distributing materials.
So pipelining is just one form of parallelism. There are many others. For example, imagine a sort of drill sergeant at the front of the room barking commands for everyone to fold, then everyone stuff, then everyone seal, etc., so each person does the exact same thing synchronously but with different addresses on their set of labels. That’s called SIMDparallelism (Single Instruction Multiple Data), and it requires that every person be matched for speed. If they are not, then instead of a pipeline bottleneck you have load imbalance.
I hope that helps you understand how pipelining contrasts with other types of parallelism
Imagine a room full of people stuffing envelopes for a business. Each person does the entire task of assembling the papers in order, folding them, placing them in the envelope, sealing the envelope, attaching postage, and attaching the address label. That’s parallelism. It scales very well, notice; you can imagine employing thousands of people to do the job if you had a huge number of things to mail out.
Now imagine that you organize the envelope-stuffers as an assembly line. One person does nothing but assemble the papers in order and handle them to the next person in line, who does nothing but fold the papers and hand them to the next person, and so on. This is efficient because each person becomes highly proficient at the single repetitive task they specialize in… but notice that it does not scale. In this case, the task is broken into six pipeline stages, so the pipeline of envelope-stuffers has limited concurrency. If the task were something more complicated, like building a car, then obviously you could break the task into many more stages and use more people in the assembly line.
The other gotcha is that it is tricky to make sure each stage of the pipeline takes the same amount of time. Maybe sealing the envelope only takes two seconds, but assembling the papers takes twenty seconds. Then you have a pipeline bottleneck.
However, pipelines have a significant advantage over the independent-worker type of parallelism: they manage the flow of material (or in the case of a computer, data). If every worker has to have their own supply of envelopes and address labels, then either they have to go replenish when they run out, or additional workers are needed for the job of distributing materials.
So pipelining is just one form of parallelism. There are many others. For example, imagine a sort of drill sergeant at the front of the room barking commands for everyone to fold, then everyone stuff, then everyone seal, etc., so each person does the exact same thing synchronously but with different addresses on their set of labels. That’s called SIMDparallelism (Single Instruction Multiple Data), and it requires that every person be matched for speed. If they are not, then instead of a pipeline bottleneck you have load imbalance.
I hope that helps you understand how pipelining contrasts with other types of parallelism
Similar questions