Science, asked by nishitanagi8559, 1 year ago

Uneven load balance between hadoop cluster

Answers

Answered by varuncharaya20
0
Identifying Hadoop load balancing issues is usually not a problem. Hadoop partititions a job into several tasks and lazily assigns these tasks to available task slots in the cluster. Load balancing issues occur if there are some tasks significantly larger than others such that in the end only a few tasks are running while all others are finished. This situation happens in case of skewed reduce keys and can be easily identified (all tasks finished but a few). Hadoop does also offer a reporter interface to report the progress of a task.

IMO, the real challenge is not to detect load balancing issues but to either avoid data skew in the beginning (by clever partitioning and choice of parallelism) or to have adaptive methods that can mitigate the effect of data skew (e.g., by recursive repartitioning of large reduce tasks

Similar questions