Enhance update_status to monitor health checks and collect diagnostic logs#1192
Enhance update_status to monitor health checks and collect diagnostic logs#1192marceloneppel wants to merge 3 commits intomainfrom
Conversation
… logs Previously, the update_status hook only checked if the PostgreSQL pebble service was active. This could miss situations where the service appeared running but the health check was failing. This change enhances the restart logic to also monitor Pebble health check status and collect diagnostic information when issues are detected. Changes: - Monitor Pebble health check status in addition to service status - Trigger restart when health check is DOWN even if service is ACTIVE - Add _get_postgresql_startup_logs() to collect diagnostic information from Pebble logs and PostgreSQL logs before and after restart attempts - Add comprehensive unit tests for health check monitoring and error handling This improves observability and helps diagnose PostgreSQL restart issues by capturing relevant logs at the time of failure. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1192 +/- ##
==========================================
- Coverage 72.85% 72.82% -0.04%
==========================================
Files 15 15
Lines 4012 4040 +28
Branches 597 599 +2
==========================================
+ Hits 2923 2942 +19
- Misses 863 871 +8
- Partials 226 227 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ption-a-health-check Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
|
Nice PR and interesting findings right out of the box: |
Hopefuly, this PR is the solution for the issue that we saw when upgrading PG to 14.20. I believe the new PG is working a little bit different and Pebble from Juju 2.9 is not correctly detecting that Patroni is stopped (it seems that Pebble is seeing PG still running and considering the Patroni service - PG process parent - still active because of that). Regarding the finding you commented, I believe that error may be a side effect of the fix added by this PR. It will need more investigation. |
…ption-a-health-check Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Issue
Solution
Previously, the update_status hook only checked if the PostgreSQL pebble service was active. This could miss situations where the service appeared running but the health check was failing. This change enhances the restart logic to also monitor Pebble health check status and collect diagnostic information when issues are detected.
Changes:
This improves observability and helps diagnose PostgreSQL restart issues by capturing relevant logs at the time of failure.
Checklist