Skip to content

Conversation

ksykulev
Copy link
Contributor

@ksykulev ksykulev commented Oct 10, 2025

Related issue: Resolves #28788

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

Testing

Database migrations

  • Checked schema for all modified table for columns that will auto-update timestamps during migration.
  • Confirmed that updating the timestamps is acceptable, and will not cause unwanted side effects.
  • Ensured the correct collation is explicitly set for character columns (COLLATE utf8mb4_unicode_ci).

Summary by CodeRabbit

  • Bug Fixes

    • macOS app checksums now include the app name, improving grouping, deduplication, and preventing mis-linking or duplicate entries when multiple names share a bundle ID.
    • More stable title handling when bundle IDs are missing, reducing unintended renames and mismatches.
  • Tests

    • Re-enabled related host-software tests and added a longest-common-prefix test to validate name reconciliation.
  • Chores

    • Database migration added to recalculate checksums for affected macOS app records.


if !minID.Valid || !maxID.Valid {
return nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using MAX(id), is ok, even if a new hosts checks in during the migration. For example,
Query runs and returns: MIN = 1, MAX = 1000
NEW host checks in inserts row with id = 1001
Migration finishes at id = 1000
Row 1001 is MISSED by migration. But it should be using the new checksum code, so we don't need to worry about migrating.

// Both have bundle_identifier - match by bundle_identifier instead of name
nameMatches = true
}
if nameMatches && td.Source == title.Source && td.ExtensionFor == title.ExtensionFor && bundleID == titleBundleID {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jahzielv I know this was recently changed for the a #33826, I would love to get your eyes on my merge conflict resolutions here.

Copy link
Member

@jahzielv jahzielv Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksykulev this looks OK to me; the logic is around bundle IDs, and the new Android ingestion code added in an android-specific identifier called application_id which isn't referenced here.

Copy link

codecov bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 87.19512% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.24%. Comparing base (5e7ba53) to head (358fdc1).
⚠️ Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/mysql/software.go 88.07% 11 Missing and 2 partials ⚠️
...51010153829_AddNameToSoftwareCheckumCalculation.go 84.90% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #34097      +/-   ##
==========================================
+ Coverage   64.23%   64.24%   +0.01%     
==========================================
  Files        2058     2059       +1     
  Lines      206902   207072     +170     
  Branches     6900     6900              
==========================================
+ Hits       132900   133037     +137     
- Misses      63575    63594      +19     
- Partials    10427    10441      +14     
Flag Coverage Δ
backend 65.35% <87.19%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ksykulev ksykulev marked this pull request as ready for review October 10, 2025 20:52
@ksykulev ksykulev requested a review from a team as a code owner October 10, 2025 20:52
isKernel: title.IsKernel,
}
uniqueTitlesToInsert[key] = title
if existing, ok := uniqueTitlesToInsert[key]; !ok || (bundleID != "" && len(title.Name) < len(existing.Name)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can try to do something more fancy here, like longest common substring to maybe get better names. But this is a basically chooses the shortest name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest common substring, with fallback to shortest name.

@getvictor
Copy link
Member

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

Walkthrough

Adds software Name to checksum computation for all sources, introduces a MySQL migration to recompute checksums for macOS (apps) rows with bundle identifiers, updates ingestion/title deduplication and related tests, and records the migration in schema.sql.

Changes

Cohort / File(s) Summary of changes
Checksum computation (core)
server/fleet/software.go
Always include Name in ComputeRawChecksum input (removed special-case excluding apps).
MySQL migration
server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go
Adds migration that batches UPDATEs to recompute checksum for rows with source='apps' and non-empty bundle_identifier using MD5 over version, source, bundle_identifier, release, arch, vendor, extension_for, extension_id, name. Registers migration.
Migration test
server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go
New test exercising the migration: inserts sample titles/software, runs migration, asserts checksums updated only for applicable macOS app rows and others unchanged.
Schema record
server/datastore/mysql/schema.sql
Bumps migration_status_tables AUTO_INCREMENT and inserts new migration status row for version 20251010153829.
Title dedup and mapping logic
server/datastore/mysql/software.go
Removes bundle-ID-based update pathway; changes getExistingSoftware and updateModifiedHostSoftwareDB signatures; groups pre-insert titles by compound key including bundle_id; chooses representative name via longest-common-prefix or shortest name; adjusts pre-insert batch argument counts and mapping of title IDs to checksums; removes obsolete helper functions (linkExistingBundleIDSoftware, updateTargetedBundleIDs).
Software ingestion tests
server/datastore/mysql/software_test.go
Re-enables previously skipped tests around same bundle ID different names and multiple same bundle IDs; adds testLongestCommonPrefix; expands assertions for title counts, renames, and host-specific behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Host as Osquery Host
  participant API as Fleet API
  participant Ingest as Ingestion Logic
  participant DB as MySQL

  Host->>API: Send software payload (name, version, bundle_identifier, ...)
  API->>Ingest: Normalize payload
  Ingest->>Ingest: Compute checksum = MD5(CONCAT_WS(CHAR(0), version, source, bundle_identifier, release, arch, vendor, extension_for, extension_id, name))
  alt bundle_identifier present and title match exists
    Ingest->>Ingest: Map to existing title (prefer shortest display_name / LCP logic)
  else no match
    Ingest->>DB: Insert title
  end
  Ingest->>DB: Upsert software versions by checksum
  DB-->>Ingest: Insert/Upsert result
  Ingest-->>API: Confirm ingestion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

#g-software, ~assisting g-software

Suggested reviewers

  • sgress454

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the main change of including the software name in checksum calculations for mac software and directly reflects the core modifications in ComputeRawChecksum and the related migration and tests.
Linked Issues Check ✅ Passed The code changes satisfy both objectives from issue #28788 by grouping titles per bundle_identifier and choosing a representative name via longest common prefix logic, updating checksums to include the name, and modifying ingestion logic to list all distinct macOS helper apps in versions and host endpoints with supporting tests.
Out of Scope Changes Check ✅ Passed All modifications—including the checksum migration, ComputeRawChecksum update, title grouping, ingestion logic adjustments, and test enhancements—are directly related to the linked issue’s requirements and there are no unrelated or extraneous changes.
Description Check ✅ Passed The pull request description follows the repository’s template by including the related issue, the submitter checklist, testing steps, and database migration checks; non‐applicable checklist items and sections have been appropriately removed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 28788-dup-apps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/datastore/mysql/software.go (1)

838-846: Critical: Map reset drops previously collected title mappings; remove reinitialization

incomingChecksumToTitle is rebuilt at Line 842, erasing entries added for titles without bundle_identifier above. This will cause missing title_id associations and “inserting software without title_id” errors.

Fix: drop the reinitialization.

As per coding guidelines

-        // no-op code change
-        incomingChecksumToTitle = make(map[string]fleet.SoftwareTitle, len(newSoftwareChecksums))
+        // keep existing mappings populated above (no reinit here)
🧹 Nitpick comments (2)
server/datastore/mysql/software.go (1)

1013-1021: Constant/placeholder mismatch; align args-per-row counts

The capacity constants don’t match the number of placeholders/arguments:

  • software_titles insert has 6 placeholders per row, but numberOfArgsPerSoftwareTitles = 5.
  • software insert has 12 placeholders per row, but numberOfArgsPerSoftware = 11.

This won’t break at runtime (slices grow) but is misleading and may hide future bugs.

-                const numberOfArgsPerSoftwareTitles = 5
+                const numberOfArgsPerSoftwareTitles = 6
-            const numberOfArgsPerSoftware = 11
+            const numberOfArgsPerSoftware = 12

Also applies to: 1084-1111

server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go (1)

1-113: Good migration test coverage for checksum change

Test accurately covers:

  • apps with bundle_identifier (updated),
  • apps without bundle_identifier and non-mac sources (unchanged),
  • verification of bundle_identifier presence and checksum values.

Consider adding one case where bundle_identifier is empty string vs NULL to ensure both are treated as “no update.”

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e7ba53 and 3f851b1.

📒 Files selected for processing (7)
  • changes/28788-name-software-checksum (1 hunks)
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go (1 hunks)
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go (1 hunks)
  • server/datastore/mysql/schema.sql (1 hunks)
  • server/datastore/mysql/software.go (3 hunks)
  • server/datastore/mysql/software_test.go (1 hunks)
  • server/fleet/software.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.

Files:

  • server/fleet/software.go
  • server/datastore/mysql/software.go
  • server/datastore/mysql/software_test.go
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go
🔇 Additional comments (7)
changes/28788-name-software-checksum (1)

1-1: LGTM! Clear changelog entry.

The changelog accurately describes the user-visible change: software name is now included in the checksum calculation for macOS apps, which addresses the issue with duplicate bundle identifiers for apps like Electron helpers.

server/datastore/mysql/schema.sql (1)

1563-1565: LGTM! Migration tracking entry.

The schema update correctly records the new migration for checksum recalculation, incrementing the AUTO_INCREMENT value and adding the migration status entry.

server/fleet/software.go (1)

150-163: LGTM! Correct checksum calculation update.

The ComputeRawChecksum() method now includes Name unconditionally in the checksum calculation, which is the correct behavior:

  • For macOS apps (source='apps'): Name is now included in the checksum (was previously excluded), enabling differentiation of apps with the same bundle_identifier but different names (e.g., Electron helper apps).
  • For other software sources: Name was already included in checksums, so behavior remains unchanged.

The field order in the Go code matches the migration SQL exactly (Version → Source → BundleIdentifier → Release → Arch → Vendor → ExtensionFor → ExtensionID → Name), ensuring consistency between new entries and migrated entries. The conditional ApplicationID handling at the end is appropriate since it applies only to Android software.

server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go (2)

12-66: LGTM! Well-implemented migration with good practices.

The Up migration correctly updates checksums for macOS apps with bundle identifiers:

Strengths:

  1. Targeted scope: The migration only updates records with source='apps' and non-empty bundle_identifier, which is appropriate since these are the apps affected by the duplicate bundle_identifier issue.
  2. Consistent filtering: The WHERE clauses in the range query (lines 17-19) and UPDATE statement (lines 54-56) are identical, ensuring only the intended rows are processed.
  3. Batch processing: The 10,000-row batch size is appropriate, balancing transaction size against the number of round-trips. The logic correctly handles partial final batches (lines 32-34).
  4. Checksum calculation: The SQL CONCAT_WS(CHAR(0), ...) matches the Go code's strings.Join(cols, "\x00") exactly, with fields in the same order.
  5. Error handling: Errors are properly wrapped with descriptive context, including batch ranges for UPDATE errors (line 61).
  6. Edge case handling: Returns early (line 26) when no matching rows exist (minID/maxID are NULL).

68-70: Down migration as no-op is appropriate.

Since this migration recomputes checksums based on existing data without schema changes, rolling back would require re-running the old checksum calculation. However, this would reintroduce the duplicate bundle_identifier issue. Leaving the Down migration as a no-op is the correct approach, as the new checksums are an improvement and shouldn't be reverted.

server/datastore/mysql/software.go (2)

997-1011: LGTM: Title dedup keyed by bundle_id (shortest name wins)

Using empty name in the dedup key when bundle_id exists and preferring the shortest title is a pragmatic tie-breaker for duplicate bundle IDs.


1067-1075: LGTM: Prefer bundle_identifier matching over name for title-ID mapping

Overriding name equality when both sides have bundle_identifier prevents mismatches due to name normalization while still requiring bundle_id equality. Correct for macOS helpers.


for _, currentSoftwareItem := range currentSoftware {
incomingSoftwareItem, ok := incomingChecksumToSoftware[currentSoftwareItem.Checksum]
_, ok := incomingChecksumToSoftware[currentSoftwareItem.Checksum]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks up incoming software by matching checksum.
With checksums now including name, if checksums match the names MUST be the same (since name is in the checksum). So the check name != currentSoftwareItem.Name can never be true (on old line 725)

Copy link
Member

@getvictor getvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change, Konstantin. I like that we're eliminating all the renaming code.

There were 2 other TEMPORARILY SKIPPED tests. Perhaps remove them if they're not needed?

The software title UI and API for Apple will need to show the software names. And different names may have different versions, theoretically. What is the plan for that?

isKernel: title.IsKernel,
}
uniqueTitlesToInsert[key] = title
if existing, ok := uniqueTitlesToInsert[key]; !ok || (bundleID != "" && len(title.Name) < len(existing.Name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest common substring, with fallback to shortest name.

@ksykulev
Copy link
Contributor Author

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
server/datastore/mysql/software_test.go (4)

1869-1958: Make name-list assertion order-insensitive to avoid collation brittleness.

Sorting/ordering can differ with MySQL collations. Prefer comparing as a set.

- require.Len(t, softwareNames, 2)
- require.Equal(t, []string{"GoLand 2.app", "GoLand.app"}, softwareNames)
+ require.Len(t, softwareNames, 2)
+ require.ElementsMatch(t, []string{"GoLand 2.app", "GoLand.app"}, softwareNames)

2027-2091: Nice coverage; consider asserting title dedup too.

Optionally assert there’s a single title for bundle_identifier “com.jetbrains.goland” to reinforce dedup guarantees (complements host-level checks).

var titleCount int
err = ds.writer(ctx).GetContext(ctx, &titleCount,
  `SELECT COUNT(DISTINCT id) FROM software_titles WHERE bundle_identifier = ?`,
  "com.jetbrains.goland")
require.NoError(t, err)
require.Equal(t, 1, titleCount)

2199-2286: Update comment to align with new checksum semantics.

PR adds name to checksum for all sources; the comment saying macOS apps don’t include name is misleading (though the test logic still holds).

- // Test Case 2: macOS apps (name is NOT included in checksum)
- // For macOS apps, the checksum doesn't include the name, so even software with
- // the same truncated name and different versions will have different checksums
+ // Test Case 2: macOS apps
+ // Even with name included in the checksum, different versions result in different
+ // checksums; both entries are distinct but map to the same title.

9274-9294: Fix small typos in comments for clarity.

Minor nits; no behavior change.

- // Verify the host only has 2 pieces of sofware
+ // Verify the host only has 2 pieces of software
@@
- // Verify the IDs are are not the same
+ // Verify the IDs are not the same
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e7ba53 and b700455.

📒 Files selected for processing (7)
  • changes/28788-name-software-checksum (1 hunks)
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go (1 hunks)
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go (1 hunks)
  • server/datastore/mysql/schema.sql (1 hunks)
  • server/datastore/mysql/software.go (12 hunks)
  • server/datastore/mysql/software_test.go (12 hunks)
  • server/fleet/software.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.

Files:

  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go
  • server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go
  • server/fleet/software.go
  • server/datastore/mysql/software.go
  • server/datastore/mysql/software_test.go
🔇 Additional comments (10)
server/fleet/software.go (1)

152-157: LGTM!

The change to include s.Name unconditionally in the checksum calculation is correct and aligns with the PR objective to distinguish macOS apps with duplicate bundle IDs. The field order matches the migration's CONCAT_WS call.

server/datastore/mysql/software_test.go (4)

2102-2194: LGTM: robust stress case for multi-version rename bug.

Covers the regression well and verifies host isolation and per-name/version cardinality.


9318-9377: LGTM: rename + new software mixed path verified correctly.

Validates per-name entries, IDs not reused, and titles count expectations.


9574-9596: LGTM: solid unit tests for longestCommonPrefix.

Good edge coverage (empty, identical, partial, mismatched).

If the helper is newly added, ensure it’s in the mysql package (see the earlier script) to keep this test compiling.


101-101: Proceed with adding LongestCommonPrefix test
The helper longestCommonPrefix(strs []string) string is defined in server/datastore/mysql/software.go:757.

server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation_test.go (1)

13-122: LGTM! Comprehensive migration test.

The test thoroughly validates the migration behavior:

  • Old checksum excludes name for "apps" source, includes it for others (lines 16-24)
  • New checksum always includes name at the end (lines 26-31)
  • Test cases cover all scenarios: apps with/without bundle_identifier (NULL vs empty string), Windows, and Linux software
  • Assertions correctly verify that only apps with non-null, non-empty bundle_identifier get updated
server/datastore/mysql/software.go (4)

755-780: longestCommonPrefix implementation is correct.

The function correctly handles all edge cases:

  • Empty slice returns empty string (line 758)
  • Single string returns itself (line 761)
  • Character-by-character comparison with early exit on mismatch (lines 772-776)
  • Correctly returns the entire first string when it's a prefix of all others (line 769)

Note: For large groups of titles with long names, consider caching or optimization if this becomes a performance bottleneck.


857-906: Title grouping and representative name selection looks good.

The logic correctly implements the PR objective to handle duplicate bundle IDs:

  • Groups titles by bundle_identifier (lines 864-875)
  • Selects representative name using longest common prefix after removing trailing non-word chars (lines 886-891)
  • Falls back to shortest name if no meaningful common prefix (lines 893-901)
  • Correctly handles single-title groups (line 903)

The heuristic should work well for common cases like "App Helper (GPU)" and "App Helper (Renderer)" finding "App Helper".


962-976: Matching logic correctly prioritizes bundle_identifier.

The updated logic correctly implements bundle_identifier-based matching:

  • When both title and incoming software have bundle_identifier, matching ignores name differences (lines 965-967)
  • This allows matching even when representative name was chosen differently than the incoming software name
  • Still requires source, extension_for, and bundle_identifier to match (line 969)
  • Correctly handles cases where only one has bundle_identifier (they won't match)

This aligns with the PR objective to use bundle_identifier as the primary identity for macOS apps.


1144-1157: LGTM! Appropriate logging level adjustment.

The change to Info level for non-apps sources is appropriate (line 1148):

  • Apps sources are managed by osquery, so nil last_opened_at transitions are expected
  • For other sources, it's informational rather than a warning
  • Condition correctly filters out apps sources (line 1147)

Comment on lines +14 to +27
err := tx.QueryRow(`
SELECT MIN(id), MAX(id)
FROM software
WHERE source = 'apps'
AND bundle_identifier IS NOT NULL
AND bundle_identifier != ''
`).Scan(&minID, &maxID)
if err != nil {
return fmt.Errorf("getting ID range: %w", err)
}

if !minID.Valid || !maxID.Valid {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

WHERE clause is too restrictive—should update all apps.

The migration filters to source = 'apps' AND bundle_identifier IS NOT NULL AND bundle_identifier != '', but the Go code change at server/fleet/software.go:152 now includes s.Name in the checksum for all apps, regardless of whether they have a bundle_identifier.

Impact:

  • Apps with empty or NULL bundle_identifier won't be migrated
  • When osquery re-reports these apps, Fleet will compute a new checksum (with name)
  • Checksum mismatch → Fleet creates duplicate software entries

Recommended fix:

Remove the bundle_identifier filter from both queries:

 	err := tx.QueryRow(`
 		SELECT MIN(id), MAX(id)
 		FROM software
 		WHERE source = 'apps'
-		  AND bundle_identifier IS NOT NULL
-		  AND bundle_identifier != ''
 	`).Scan(&minID, &maxID)
 		softwareStmt := `
 		UPDATE software SET
 			checksum = UNHEX(
 			MD5(
 				-- concatenate with separator \x00
 				CONCAT_WS(CHAR(0),
 					version,
 					source,
 					bundle_identifier,
 					` + "`release`" + `,
 					arch,
 					vendor,
 					extension_for,
 					extension_id,
 					name
 				)
 			)
 		)
 		WHERE source = 'apps'
-		  AND bundle_identifier IS NOT NULL
-		  AND bundle_identifier != ''
 		  AND id >= ? AND id <= ?
 		`

As per coding guidelines: SQL queries must have correct filtering to avoid non-deterministic results and data inconsistencies.

Also applies to: 54-56

🤖 Prompt for AI Agents
In
server/datastore/mysql/migrations/tables/20251010153829_AddNameToSoftwareCheckumCalculation.go
around lines 14-27 (and also adjust the similar query at lines 54-56), the WHERE
clause incorrectly restricts rows to "bundle_identifier IS NOT NULL AND
bundle_identifier != ''", but the checksum change now applies to all apps;
remove the bundle_identifier conditions so the SQL only filters by source =
'apps' to include all app rows, leaving the rest of the query, Scan and error
handling unchanged.


softwareStmt := `
UPDATE software SET
checksum = UNHEX(
Copy link
Contributor Author

@ksykulev ksykulev Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set name_source to basic on all of these? We are getting rid of updateTargetedBundleIDs so there will no longer be any names on the software table set by the "renaming" functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't think we would modify them either way because if they got set to the newer form we can trust those more than just filename based.

Comment on lines 858 to 876
titleGroups := make(map[titleKey][]fleet.SoftwareTitle)
for _, title := range newTitlesNeeded {
bundleID := ""
if title.BundleIdentifier != nil {
bundleID = *title.BundleIdentifier
}
keyName := title.Name
if bundleID != "" {
keyName = ""
}
key := titleKey{
name: title.Name,
name: keyName,
source: title.Source,
extensionFor: title.ExtensionFor,
bundleID: bundleID,
isKernel: title.IsKernel,
}
uniqueTitlesToInsert[key] = title
titleGroups[key] = append(titleGroups[key], title)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm looking at this correctly, we're still merging multiple items with the same bundle ID (e.g. helpers) into one title, rather than just showing them all in the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great question! I am putting all the software names from the ingestion on a host into a titleGroups slice. Then the next part goes through that titleGroups slice and finds the best representative name for the group. If there is only one item, easy, we add it to uniqueTitlesToInsert. Otherwise, we go through and try to find the best name to represent the group (longest common prefix, or shortest name by length). Why? Because unlike the software table, the software_titles table is still grouped by bundle_identifier for mac os apps. So we need to find a single name to represent them.

Copy link
Contributor

@sgress454 sgress454 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explainer. So the software table will have all the versions of the all the individual helpers, but they'll all point to the same title? Seems like the worst-case scenario would be if the helpers shared the same bundle ID as the main app, and they didn't have a common prefix to the names, and we ended up picking the shortest name which happened to be one of the helpers. I think that would be extremely sloppy on the part of the app publisher though, so I'm +1 on not trying to defeat an edge case we haven't seen yet as far as we know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will inevitably have to work in some sort of method to override names. Similar to the way we do overrides in vulnerability software matching.

Copy link
Member

@getvictor getvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ksykulev ksykulev merged commit bb4b62b into main Oct 14, 2025
39 checks passed
@ksykulev ksykulev deleted the 28788-dup-apps branch October 14, 2025 22:36
ksykulev added a commit that referenced this pull request Oct 15, 2025
**Related issue:** Resolves #28788

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)

- [x] Added/updated automated tests
- [x] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)
- [x] QA'd all new/changed functionality manually

- [x] Checked schema for all modified table for columns that will
auto-update timestamps during migration.
- [x] Confirmed that updating the timestamps is acceptable, and will not
cause unwanted side effects.
- [x] Ensured the correct collation is explicitly set for character
columns (`COLLATE utf8mb4_unicode_ci`).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **Bug Fixes**
* macOS app checksums now include the app name, improving grouping,
deduplication, and preventing mis-linking or duplicate entries when
multiple names share a bundle ID.
* More stable title handling when bundle IDs are missing, reducing
unintended renames and mismatches.

* **Tests**
* Re-enabled related host-software tests and added a
longest-common-prefix test to validate name reconciliation.

* **Chores**
* Database migration added to recalculate checksums for affected macOS
app records.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ksykulev added a commit that referenced this pull request Oct 15, 2025
**Related issue:** Resolves #28788

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)

- [x] Added/updated automated tests
- [x] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)
- [x] QA'd all new/changed functionality manually

- [x] Checked schema for all modified table for columns that will
auto-update timestamps during migration.
- [x] Confirmed that updating the timestamps is acceptable, and will not
cause unwanted side effects.
- [x] Ensured the correct collation is explicitly set for character
columns (`COLLATE utf8mb4_unicode_ci`).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **Bug Fixes**
* macOS app checksums now include the app name, improving grouping,
deduplication, and preventing mis-linking or duplicate entries when
multiple names share a bundle ID.
* More stable title handling when bundle IDs are missing, reducing
unintended renames and mismatches.

* **Tests**
* Re-enabled related host-software tests and added a
longest-common-prefix test to validate name reconciliation.

* **Chores**
* Database migration added to recalculate checksums for affected macOS
app records.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ksykulev added a commit that referenced this pull request Oct 15, 2025
Back ported: #34097 to be compatible with the 4.75.0 release.
Original issue #28788
ksykulev added a commit that referenced this pull request Oct 15, 2025
Back ported: #34097 to be compatible with the 4.75.0 release.
Original issue #28788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🪲macOS apps with duplicate bundle IDs do not appear in Fleet

6 participants