Skip to content

Conversation

jw1u1
Copy link
Contributor

@jw1u1 jw1u1 commented Sep 4, 2025

run_maintenance acquires locks on the partitions before locking the parent table.
This is in opposite to the common table access and leads to deadlocks.
This patch acquires explicitly a SHARE UPDATE EXCLUSIVE lock on the parent table only first, to prevent these deadlocks.

@keithf4
Copy link
Collaborator

keithf4 commented Sep 4, 2025

Thank you for looking into this to find the lesser restrictive lock required! However, I think this would be better to be done only in the run_maintenance_proc() procedure. Otherwise when this lock is acquired via the normal run_maintenance() call, it will stay in place for the entire run of the plain function which keeps all partition tables locked for the entire transaction run.

This will require some more code in the procedure to do a catalog lookup (similar to how it's done in run_maintenance). But this will prevent those that run maintenance without calling the procedure from potentially taking out too many locks. The procedure should instead release the lock after commit of each partition set's maintenance.

If you'd like to adjust your PR to do this, please feel free. Otherwise I can look at incorporating this code into the procedure in the near future.

Thanks again!

@jw1u1
Copy link
Contributor Author

jw1u1 commented Sep 4, 2025

Hi Keith,
unfortunately, we are not using the procedure.
It is not used by the background worker and it fails completely, if there is a configuration error in part_config with one table.
The latter could probably be fixed but i guess there are plenty of installations out there that don't use the procedure neither.

Do you agree to add this functionality only, if run_maintenance() is called with a parent_table Parameter?

@keithf4
Copy link
Collaborator

keithf4 commented Sep 4, 2025

As of version 5.x, the background worker has always used the procedure. That was the main reason for the minimum version of PostgreSQL changing to 14.

If there's an error in part_config, that would also cause run_maintenance() to fail as well since that function runs the maintenance for all partition sets in a single transaction. With the procedure it should only roll back the most recent partition set to fail.

@jw1u1
Copy link
Contributor Author

jw1u1 commented Sep 5, 2025

Would you agree to add a parameter to run_maintenance like lock_parent?

@keithf4
Copy link
Collaborator

keithf4 commented Sep 5, 2025

Yes, I'd be ok with that. I was actually thinking of adding another option to the part_config table, but I think this would be better. Would also need to be added to run_maintenance_proc() as well and also be a new GUC for the background worker.

I would prefer that the default value for it be false.

Thank you!

@keithf4
Copy link
Collaborator

keithf4 commented Sep 5, 2025

I can handle the GUC parameter if it's not something you want to touch

@jw1u1
Copy link
Contributor Author

jw1u1 commented Sep 5, 2025

I have committed the changes and set lock_parent to true for the procedure and false for the function.

@jw1u1
Copy link
Contributor Author

jw1u1 commented Sep 5, 2025

On an error in part_config, the Background worker stops processing the remaining Tables:
LOG: pg_partman dynamic background worker (dbname=foo) dynamic background worker initialized with role foo_owner on database foo
ERROR: Unable to find configured template table in system catalogs: partman.template_foo_old
CONTEXT: PL/pgSQL function inherit_template_properties(text,text,text) line 71 at RAISE

@keithf4
Copy link
Collaborator

keithf4 commented Sep 5, 2025

On an error in part_config, the Background worker stops processing the remaining Tables: LOG: pg_partman dynamic background worker (dbname=foo) dynamic background worker initialized with role foo_owner on database foo ERROR: Unable to find configured template table in system catalogs: partman.template_foo_old CONTEXT: PL/pgSQL function inherit_template_properties(text,text,text) line 71 at RAISE

Right, but that would be true in a manual call of the function or procedure as well, I believe. There is no code to catch that sort of exception and skip over it. I could potentially add something like that though

@keithf4 keithf4 self-assigned this Sep 23, 2025
@keithf4 keithf4 self-requested a review September 23, 2025 14:15
@keithf4 keithf4 added this to the Next Patch milestone Sep 23, 2025
@keithf4
Copy link
Collaborator

keithf4 commented Oct 14, 2025

Since this an API change, I'll look at getting this in for the next release of 5.4. Going to try and get a patch release out first with some of your other fixes.

@keithf4 keithf4 modified the milestones: Next Patch, 5.4 Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants