-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17818. Fix serial fsimage transfer during checkpoint with multiple namenodes #7862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Hexiaoqiao @ayushtkn @tomscut Do you think that uploading fsimage in checkpoint with observer namenode should be changed from serial to parallel? |
In our cluster, each namespace has four NameNodes: one active, one standby, and two observers. When the standby NameNode performs a checkpoint, it transfer the fsimage to the other three NameNodes. However, we found that these transfer are performed serially.
The reason is that the corePoolSize in ThreadPoolExecutor is 0, and the transfer task does not fill the LinkedBlockingQueue, resulting in only one thread transfer the fsimage at a time. This greatly increases the checkpoint time.
ExecutorService executor = new ThreadPoolExecutor(0, activeNNAddresses.size(), 100, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(activeNNAddresses.size()), uploadThreadFactory);