Skip to content

Fix log monitoring baseline overwrite after Home Assistant restarts#154

Merged
ian-morgan99 merged 6 commits intomainfrom
copilot/fix-log-monitoring-issue
Jan 16, 2026
Merged

Fix log monitoring baseline overwrite after Home Assistant restarts#154
ian-morgan99 merged 6 commits intomainfrom
copilot/fix-log-monitoring-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 13, 2026

After HA restarts, logs are cleared and the baseline was being overwritten with the resulting 0-5 startup errors, losing the meaningful pre-restart baseline needed for comparison.

Changes

Baseline update logic (save_current_logs)

  • Skip baseline updates when detecting 0-5 errors with an existing baseline (HA restart signature)
  • Update baseline only for 6-20 errors (stable operation) or first run
  • Extracted thresholds as class constants:
    MIN_ERRORS_FOR_BASELINE_UPDATE = 6
    MAX_ERRORS_FOR_BASELINE_UPDATE = 20
    MIN_ERRORS_FOR_PREVIOUS_LOGS = 6

Baseline fallback logic (load_previous_logs)

  • Fall back to baseline when previous_logs.json has <6 errors (restart scenario)
  • Use previous_logs.json when ≥6 errors (normal operation)
  • Handle first-run case where no baseline exists yet

Behavior

Error count Baseline exists Action
0-5 Yes Preserve baseline, don't overwrite
0-5 No Create baseline
6-20 Yes/No Update baseline
21+ Yes/No Don't update (system unstable)

Example

# Before restart: baseline has 11 errors
monitor.save_current_logs(baseline_errors)  # Creates baseline

# After HA restart: only 2 startup errors visible
monitor.save_current_logs(post_restart_errors)  # Baseline preserved at 11 errors

# Load for comparison
previous = monitor.load_previous_logs()  # Returns 11 errors from baseline
can_determine_changes = len(previous) > 0  # True ✓

Result: Users get meaningful comparisons after restarts instead of "no baseline to compare against."

Original prompt

This section details on the original issue you should resolve

<issue_title>The log monitoring appears to function fine when not rebooted, but doesn't give useful output. Doesn't work after a reboot</issue_title>
<issue_description>Example of a log review after a minor update that didnt require a reboot.

Log Monitoring Status: RED

🔴 Changes detected in log entries since last check.

Severity: LOW

Summary:
1 new error/warning messages detected. 1 previous errors have been resolved.

Changes:

New errors/warnings: 1
Resolved errors: 1
Recommendations:

Good news: Some previous errors have been resolved.
Analysis powered by: Heuristics
Check time: 2026-01-13 01:01:12 UTC
Log lookback period: 96 hours

Next Steps:

After a restart, it always says there is no baseline to compare against. The baseline log entries must persist inbetween updates of any specific component

</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits January 13, 2026 02:16
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
…ering

Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Co-authored-by: ian-morgan99 <106852423+ian-morgan99@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix log monitoring output after reboot Fix log monitoring baseline overwrite after Home Assistant restarts Jan 13, 2026
Copilot AI requested a review from ian-morgan99 January 13, 2026 02:24
Copy link
Copy Markdown
Owner

@ian-morgan99 ian-morgan99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the use thresholds for error logs being used to determine what is a reboot vs a service restart Vs regular running .
Also, I don't like that it's not a configurable parameter in the config option.

I suggest we have an advanced section in the config (collapsed when not required ) .

I suggest we get better logic for when a service restarts Vs when HA is restarting Vs regular running .

On HA restart, the baseline errors (pre-update restart) should be caught. On service/add on restarts the baseline errors (pre-update component start) should be caught.
After an update , if a restart is not required, just check current errors Vs component start errors.
After the next restart, check the current errors Vs pre-update restart errors.
You will need to keep track of what checks have been done after each component update.

Think carefully through this logic , and propose a better way if you can think of one.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 13, 2026

I don't like the use thresholds for error logs being used to determine what is a reboot vs a service restart Vs regular running .
Also, I don't like that it's not a configurable parameter in the confi...

I understand your concerns about the threshold approach. You're right that it's too simplistic.

Let me propose a better solution that tracks the actual state transitions:

Approach: Track Update Events and Restart Types

  1. Store metadata with each baseline:

    • Timestamp
    • Update context (component_updated, ha_restarted, addon_restarted, normal_check)
    • Component/addon that was updated (if applicable)
  2. Use Home Assistant's APIs to detect restart types:

    • Check HA's uptime to detect HA restarts
    • Track add-on's own container start time
    • Compare timestamps to determine if logs were cleared
  3. Smart baseline selection:

    • After component update (no restart): Compare to pre-update baseline
    • After HA restart: Compare to pre-restart baseline (preserved across restarts)
    • After add-on restart: Create new baseline but preserve HA-level baseline
  4. Make it configurable:

    • Add advanced config section with:
      • baseline_preservation_strategy (automatic/manual)
      • restart_detection_method (uptime/threshold/hybrid)
      • Ability to override thresholds if needed

This eliminates guessing based on error counts and uses actual system state. Would you like me to implement this approach?

Copilot AI requested a review from ian-morgan99 January 13, 2026 07:47
@ian-morgan99 ian-morgan99 marked this pull request as ready for review January 16, 2026 02:46
Copilot AI review requested due to automatic review settings January 16, 2026 02:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical issue where the log monitoring baseline was being overwritten after Home Assistant restarts, resulting in "no baseline to compare against" messages. The fix introduces intelligent baseline management that preserves the pre-restart baseline when detecting 0-5 errors (restart signature) while updating it during genuine stable operation (6-20 errors).

Changes:

  • Added baseline management thresholds as class constants to distinguish HA restarts from stable operation
  • Modified save_current_logs() to skip baseline updates when detecting 0-5 errors with an existing baseline
  • Enhanced load_previous_logs() to fall back to baseline when previous_logs.json has fewer than 6 errors
  • Added comprehensive test coverage with test_baseline_persistence.py and demonstration script demo_baseline_persistence.py

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
ha_sentry/rootfs/app/log_monitor.py Implements baseline preservation logic with three new threshold constants and updated save/load methods
tests/test_baseline_persistence.py Comprehensive test suite covering baseline preservation, first-run scenarios, and fallback logic
tests/demo_baseline_persistence.py Interactive demonstration script showing baseline persistence across HA restart scenarios

'2026-01-13 03:40:00 ERROR homeassistant.components.zwave: Timeout',
]

can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable can_determine is not used.

Copilot uses AI. Check for mistakes.
'2026-01-13 03:40:00 ERROR homeassistant.components.zwave: Timeout',
]

can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable analysis is not used.

Suggested change
can_determine, analysis = simulate_log_check(monitor, stable_errors, "Stable Operation Check")
can_determine, _ = simulate_log_check(monitor, stable_errors, "Stable Operation Check")

Copilot uses AI. Check for mistakes.
'2026-01-12 00:55:00 WARNING homeassistant.components.automation: Long execution',
]

can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'can_determine' is unnecessary as it is redefined before this value is used.

Suggested change
can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check")
_, analysis = simulate_log_check(monitor, initial_errors, "Initial Check")

Copilot uses AI. Check for mistakes.
'2026-01-12 00:55:00 WARNING homeassistant.components.automation: Long execution',
]

can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'analysis' is unnecessary as it is redefined before this value is used.

Suggested change
can_determine, analysis = simulate_log_check(monitor, initial_errors, "Initial Check")
can_determine, _ = simulate_log_check(monitor, initial_errors, "Initial Check")

Copilot uses AI. Check for mistakes.
import os
import tempfile
import json
from datetime import datetime
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'datetime' is not used.

Suggested change
from datetime import datetime

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

@ian-morgan99 ian-morgan99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement all recommendations please

@ian-morgan99 ian-morgan99 merged commit 66acaa2 into main Jan 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The log monitoring appears to function fine when not rebooted, but doesn't give useful output. Doesn't work after a reboot

3 participants