Skip to content

Commit 9bbe9c4

Browse files
mherrmanargenisf
andauthored
Refine deduplication explanations in documentation (#2205)
Clarified the explanation of query-time deduplication and noted that Mixpanel does not guarantee upsert behavior. Co-authored-by: Argenis Jesus Ferrer Mora <[email protected]>
1 parent 4adb414 commit 9bbe9c4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

openapi/src/docs/ingestion/track-event-deduplication.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,13 +84,13 @@ Mixpanel uses two main deduplication processes:
8484
### Query-Time Deduplication
8585

8686
- When: Happens immediately when you query data in the Mixpanel UI.
87-
- How: If multiple events share the same event_name, distinct_id, timestamp, and $insert_id, only the most recent version of the event is shown in reports (based on the API ingestion time). This ensures that duplicate events do not affect your analytics in real time.
87+
- How: If multiple events share the same event_name, distinct_id, timestamp, and $insert_id. In most cases, only the more recent version of the event is shown in reports (based on the API ingestion time). Its important to note that Mixpanel does not guarantee upsert behavior however.
8888
- Scope: This deduplication is visible in the Mixpanel UI and reports, but not in raw data exports. Raw event export will contain all data as they were ingested, without any deduplication.
8989

9090
### Compaction-Time Deduplication
9191

9292
- When: Runs periodically in the backend, typically after a few hours and again after about 20 days, once data ingestion for a day is complete.
93-
- How: During compaction, Mixpanel scans for events with the same event name, distinct_id, and $insert_id (timestamp does not need to match exactly, just the same calendar day). The older event is deleted, and only the latest remains in storage.
93+
- How: During compaction, Mixpanel scans for events with the same event name, distinct_id, and $insert_id (timestamp does not need to match exactly, just the same calendar day).
9494
- Scope: This process helps reduce storage of duplicate events and may affect event counts if duplicates were present with different timestamps
9595

9696
<br />

0 commit comments

Comments
 (0)