-
Notifications
You must be signed in to change notification settings - Fork 801
Description
Related to #26721
In the QA Wolf Premium test environment, old osquery-perf hosts are torn down and new hosts are spun up daily. These hosts have vulnerable software associated with them. Old hosts get cleared out using the "Host expiry" option in Organization Settings. We've identified that when the Linux hosts disappear, the OVAL vulnerabilities/orphan vulnerabilities aren't getting cleaned up.
We believe the large number of vulnerabilities are contributing to the long duration of the vulnerabilities cron run time, which then results in a 0 vulns count while the cron is running.
Task
This is a spike to determine if this scenario is reproducible, and the output will be reproduction steps.
Condition of satisfaction
Reproducible steps that result in an increase in the total vulnerabilities count when hosts are added and removed on a recurring basis.
UPDATE:
It looks like this is an issue for non-NVD and non-custom vulnerability sources. The cleanup for NVD sources happens here:
if err = ds.DeleteOutOfDateVulnerabilities(ctx, fleet.NVDSource, startTime); err != nil {
level.Error(logger).Log("msg", "error deleting out of date vulnerabilities", "err", err)
}
if err = ds.DeleteOutOfDateOSVulnerabilities(ctx, fleet.NVDSource, startTime); err != nil {
level.Error(logger).Log("msg", "error deleting out of date OS vulnerabilities", "err", err)
}
So, we can take a similar approach with the cleanup of other sources, like OVAL, etc.
QA guide
Background
When hosts are removed (e.g., host expiry), their software/OS vulnerability entries remain in the database even though no host references them anymore. This inflates vulnerability counts over time. The fix adds cleanup of these orphaned entries during the vulnerability cron.
Steps
1. Check current orphan counts (baseline)
Run against MySQL:
SELECT
(SELECT COUNT(*) FROM software_cve sc
LEFT JOIN host_software hs ON hs.software_id = sc.software_id
WHERE hs.host_id IS NULL) AS orphaned_sw_vulns,
(SELECT COUNT(*) FROM operating_system_vulnerabilities osv
LEFT JOIN host_operating_system hos ON hos.os_id = osv.operating_system_id
WHERE hos.host_id IS NULL) AS orphaned_os_vulns;Record these numbers.
2. Find a host with unique software vulnerabilities
Find a host whose software isn't shared with other hosts, so deleting it will create orphans:
SELECT h.id, h.hostname, sc.software_id, sc.cve
FROM hosts h
JOIN host_software hs ON hs.host_id = h.id
JOIN software_cve sc ON sc.software_id = hs.software_id
WHERE hs.software_id IN (
SELECT software_id FROM host_software GROUP BY software_id HAVING COUNT(*) = 1
)
LIMIT 20;Pick a host and note its id, software_id, and how many CVEs it has.
3. Delete the host
DELETE https://<fleet-server>/api/latest/fleet/hosts/<host_id>
Authorization: Bearer <api_key>
4. Verify vulns are now orphaned
-- Should return the CVEs from step 2 (still in DB but orphaned)
SELECT COUNT(*) FROM software_cve WHERE software_id = <software_id>;
-- Orphan count should have increased
SELECT COUNT(*) FROM software_cve sc
LEFT JOIN host_software hs ON hs.software_id = sc.software_id
WHERE hs.host_id IS NULL;5. Trigger the vulnerability cron
POST https://<fleet-server>/api/latest/fleet/trigger
Authorization: Bearer <api_key>
Content-Type: application/json
{"name": "vulnerabilities"}
Wait for the cron to complete. Monitor logs for msg=completed cron=vulnerabilities. This can take several minutes depending on fleet size.
6. Verify orphans are cleaned up
-- Should be 0
SELECT COUNT(*) FROM software_cve WHERE software_id = <software_id>;
-- Should be 0 (or at least reduced to baseline)
SELECT
(SELECT COUNT(*) FROM software_cve sc
LEFT JOIN host_software hs ON hs.software_id = sc.software_id
WHERE hs.host_id IS NULL) AS orphaned_sw_vulns,
(SELECT COUNT(*) FROM operating_system_vulnerabilities osv
LEFT JOIN host_operating_system hos ON hos.os_id = osv.operating_system_id
WHERE hos.host_id IS NULL) AS orphaned_os_vulns;Both should be 0.
7. Verify non-orphaned vulns are untouched
Pick a host that still exists and has vulnerabilities. Confirm its vulnerability count in the UI or API hasn't changed.
Expected results
- Orphaned software and OS vulnerability counts drop to 0 after the vuln cron runs
- Vulnerabilities for hosts that still exist are not affected
- No errors in server logs related to orphan cleanup
Metadata
Metadata
Assignees
Labels
Type
Projects
Status