Uneven load balance between hadoop cluster
Answers
Answered by
0
Identifying Hadoop load balancing issues is usually not a problem. Hadoop partititions a job into several tasks and lazily assigns these tasks to available task slots in the cluster. Load balancing issues occur if there are some tasks significantly larger than others such that in the end only a few tasks are running while all others are finished. This situation happens in case of skewed reduce keys and can be easily identified (all tasks finished but a few). Hadoop does also offer a reporter interface to report the progress of a task.
IMO, the real challenge is not to detect load balancing issues but to either avoid data skew in the beginning (by clever partitioning and choice of parallelism) or to have adaptive methods that can mitigate the effect of data skew (e.g., by recursive repartitioning of large reduce tasks
IMO, the real challenge is not to detect load balancing issues but to either avoid data skew in the beginning (by clever partitioning and choice of parallelism) or to have adaptive methods that can mitigate the effect of data skew (e.g., by recursive repartitioning of large reduce tasks
Similar questions
History,
7 months ago
Computer Science,
7 months ago
Art,
7 months ago
Science,
1 year ago
Math,
1 year ago