-
Notifications
You must be signed in to change notification settings - Fork 302
prevent deadlocks on run_maintenance #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
Thank you for looking into this to find the lesser restrictive lock required! However, I think this would be better to be done only in the This will require some more code in the procedure to do a catalog lookup (similar to how it's done in If you'd like to adjust your PR to do this, please feel free. Otherwise I can look at incorporating this code into the procedure in the near future. Thanks again! |
Hi Keith, Do you agree to add this functionality only, if run_maintenance() is called with a parent_table Parameter? |
As of version 5.x, the background worker has always used the procedure. That was the main reason for the minimum version of PostgreSQL changing to 14. If there's an error in part_config, that would also cause run_maintenance() to fail as well since that function runs the maintenance for all partition sets in a single transaction. With the procedure it should only roll back the most recent partition set to fail. |
Would you agree to add a parameter to run_maintenance like lock_parent? |
Yes, I'd be ok with that. I was actually thinking of adding another option to the part_config table, but I think this would be better. Would also need to be added to I would prefer that the default value for it be false. Thank you! |
I can handle the GUC parameter if it's not something you want to touch |
I have committed the changes and set lock_parent to true for the procedure and false for the function. |
On an error in part_config, the Background worker stops processing the remaining Tables: |
Right, but that would be true in a manual call of the function or procedure as well, I believe. There is no code to catch that sort of exception and skip over it. I could potentially add something like that though |
Since this an API change, I'll look at getting this in for the next release of 5.4. Going to try and get a patch release out first with some of your other fixes. |
run_maintenance acquires locks on the partitions before locking the parent table.
This is in opposite to the common table access and leads to deadlocks.
This patch acquires explicitly a SHARE UPDATE EXCLUSIVE lock on the parent table only first, to prevent these deadlocks.