Fix log monitoring baseline overwrite after Home Assistant restarts#154
Fix log monitoring baseline overwrite after Home Assistant restarts#154ian-morgan99 merged 6 commits intomainfrom
Conversation
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
…ering Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
ian-morgan99
left a comment
There was a problem hiding this comment.
I don't like the use thresholds for error logs being used to determine what is a reboot vs a service restart Vs regular running .
Also, I don't like that it's not a configurable parameter in the config option.
I suggest we have an advanced section in the config (collapsed when not required ) .
I suggest we get better logic for when a service restarts Vs when HA is restarting Vs regular running .
On HA restart, the baseline errors (pre-update restart) should be caught. On service/add on restarts the baseline errors (pre-update component start) should be caught.
After an update , if a restart is not required, just check current errors Vs component start errors.
After the next restart, check the current errors Vs pre-update restart errors.
You will need to keep track of what checks have been done after each component update.
Think carefully through this logic , and propose a better way if you can think of one.
I understand your concerns about the threshold approach. You're right that it's too simplistic. Let me propose a better solution that tracks the actual state transitions: Approach: Track Update Events and Restart Types
This eliminates guessing based on error counts and uses actual system state. Would you like me to implement this approach? |
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical issue where the log monitoring baseline was being overwritten after Home Assistant restarts, resulting in "no baseline to compare against" messages. The fix introduces intelligent baseline management that preserves the pre-restart baseline when detecting 0-5 errors (restart signature) while updating it during genuine stable operation (6-20 errors).
Changes:
- Added baseline management thresholds as class constants to distinguish HA restarts from stable operation
- Modified
save_current_logs()to skip baseline updates when detecting 0-5 errors with an existing baseline - Enhanced
load_previous_logs()to fall back to baseline whenprevious_logs.jsonhas fewer than 6 errors - Added comprehensive test coverage with
test_baseline_persistence.pyand demonstration scriptdemo_baseline_persistence.py
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| ha_sentry/rootfs/app/log_monitor.py | Implements baseline preservation logic with three new threshold constants and updated save/load methods |
| tests/test_baseline_persistence.py | Comprehensive test suite covering baseline preservation, first-run scenarios, and fallback logic |
| tests/demo_baseline_persistence.py | Interactive demonstration script showing baseline persistence across HA restart scenarios |
| '2026-01-13 03:40:00 ERROR homeassistant.components.zwave: Timeout', | ||
| ] | ||
|
|
||
| can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check") |
There was a problem hiding this comment.
Variable can_determine is not used.
| '2026-01-13 03:40:00 ERROR homeassistant.components.zwave: Timeout', | ||
| ] | ||
|
|
||
| can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check") |
There was a problem hiding this comment.
Variable analysis is not used.
| can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check") | |
| can_determine, _ = simulate_log_check(monitor, stable_errors, "Stable Operation Check") |
| '2026-01-12 00:55:00 WARNING homeassistant.components.automation: Long execution', | ||
| ] | ||
|
|
||
| can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check") |
There was a problem hiding this comment.
This assignment to 'can_determine' is unnecessary as it is redefined before this value is used.
| can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check") | |
| _, analysis = simulate_log_check(monitor, initial_errors, "Initial Check") |
| '2026-01-12 00:55:00 WARNING homeassistant.components.automation: Long execution', | ||
| ] | ||
|
|
||
| can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check") |
There was a problem hiding this comment.
This assignment to 'analysis' is unnecessary as it is redefined before this value is used.
| can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check") | |
| can_determine, _ = simulate_log_check(monitor, initial_errors, "Initial Check") |
| import os | ||
| import tempfile | ||
| import json | ||
| from datetime import datetime |
There was a problem hiding this comment.
Import of 'datetime' is not used.
| from datetime import datetime |
ian-morgan99
left a comment
There was a problem hiding this comment.
Implement all recommendations please
After HA restarts, logs are cleared and the baseline was being overwritten with the resulting 0-5 startup errors, losing the meaningful pre-restart baseline needed for comparison.
Changes
Baseline update logic (
save_current_logs)Baseline fallback logic (
load_previous_logs)previous_logs.jsonhas <6 errors (restart scenario)previous_logs.jsonwhen ≥6 errors (normal operation)Behavior
Example
Result: Users get meaningful comparisons after restarts instead of "no baseline to compare against."
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.