The purpose of this article is to document my observation of HDFS Balancer behavior during my work engagement. It is also possible this observation is valid only for certain version.
These observation applies to CDH 5.11+, although I noticed that each CDH minor release there are new parameter introduced to optimize the balancer.
A few general tips I recommends:
Search the web, there are many great article out there
Make sure to check version compatibility
Test and validate the effectiveness of your new parameter
Introduce one parameter at the time
Do this responsibly and do not disturb your production load
Ok with that said, let's get to it. By default CDH comes with hdfs balancer pre configured to run periodically. However, sometimes we want to run the balancer at different speed. For instance, after capacity expansion we want to speed up the balancer process. Below are some observation when we run balancer with different scenario:
Let's break it down:
# this is to increase balancer heapsize, when you are using a lot of mover it may be required to specify this parameter
# this is the command to run balancer in CDH
# this is the total thread for the balancer across all contributor nodes. This number will be distributed to all contributor nodes. In some instance increasing this number is counter productive
# not so sure the impact of this parameter
# this the size of block to move per iteration. Increasing this number, normally will increase throughput. However, do not increase to high that each iteration will exceed 20 mnt
# this is supposed to limit the network usage total
# This number is the maximum number of thread/moves if the moverThreads above is too high
# Increasing this will require more memory for the balancer
# this is supposed to limit the network usage at datanode level
# this is the threshold of difference between contributor and receiver for balancer to stop balancing. However, we observe setting 10 will result a 20% difference and setting 5 will result a 10% difference
Setting the parameter above not really improving the performance. We observe a lot of waiting and looks like it is throttling until we execute this command:
After executing this command and without restarting the balancer, the balancer performance is more stable.
It is also important to monitor name node and sentry during balancing. In our case namenode and sentry become unstable when we increase the speed. Therefore we need to increase Sentry Heap to 2GB.
Another thing to keep in mind, during the initial phase per iteration, balancer will calculate the "transfer map" which map which node will send data to other node. The number of nodes participating is depends on the number of receiver nodes available. The higher the number of receiver nodes the faster the balancing process (assuming no throttling)
Current Cluster Nodes = 100 Nodes. Which of the following scenario will have a faster balancing/high throughput?
A. Add 50 new nodes
B. Add 100 new nodes
(Answer) B is faster.
Below are the sample different outcome:
There are two things that we notice above:
Network throughput is not as high as we expectedThe gap between iteration is too big
This can only transfer 300GB/Iteration which takes around 7 minutes.
Here we notice the following
The gap is narrowerThe Overall throughput is much better.
This can transfer ~870GB/Iteration which takes around 6 minutes.
In Summary, you need to test out those parameter and find out which setting suit your needs and scenario. Increasing arbitrary number sometimes may have counter productive effect.
If you have other tips or comments feel free to leave a comments below.