|
| 1 | +# Common SQL Queries |
| 2 | + |
| 3 | +### Total daily count with deduplication logic and timezone adjustment |
| 4 | + |
| 5 | +Events exported via pipelines (i.e. raw exports) can contain duplicates. Deduplication should be performed using 4 event properties: `event_name`, `time`, `distinct_id`, and `insert_id` (docs [here](https://developer.mixpanel.com/reference/event-deduplication)). This is an example of a total daily count, converted to a specific timezone and deduplicated. |
| 6 | + |
| 7 | +```sql |
| 8 | +SELECT |
| 9 | + DATE(time, 'America/Los_Angeles') AS event_date, |
| 10 | + COUNT(DISTINCT CONCAT(event_name, time, distinct_id, insert_id)) AS event_count, |
| 11 | +FROM |
| 12 | + `<your dataset>.mp_master_event` |
| 13 | +WHERE |
| 14 | + DATE(time, 'America/Los_Angeles') >= '2025-08-01' |
| 15 | + AND DATE(time, 'America/Los_Angeles') < '2025-09-16' |
| 16 | +GROUP BY |
| 17 | + 1 |
| 18 | +ORDER BY |
| 19 | + 1 ASC |
| 20 | +``` |
| 21 | + |
| 22 | +### Unique user count with user ID resolution |
| 23 | + |
| 24 | +Raw events may contain the original `distinct_id` associated with the user at the time of the event instead of the final canonical `distinct_id` for the user after authentication. The `mp_identity_mappings_data_view` contains mappings of the original `distinct_id`s to the resolved ones (i.e. canonical `distinct_id`s). You can use this mapping to make sure that the unique users calculations account for ID management and therefore more accurate. |
| 25 | + |
| 26 | +```sql |
| 27 | +SELECT |
| 28 | + DATE(time, 'America/Los_Angeles') AS event_date, |
| 29 | + COUNT(DISTINCT resolved_user_id) AS unique_users |
| 30 | +FROM ( |
| 31 | + SELECT |
| 32 | + time, |
| 33 | + IFNULL(id_mappings.resolved_distinct_id, events.distinct_id) AS resolved_user_id |
| 34 | + FROM |
| 35 | + `<your dataset>.mp_master_event` AS events |
| 36 | + LEFT JOIN |
| 37 | + `<your dataset>.mp_identity_mappings_data_view` AS id_mappings |
| 38 | + ON |
| 39 | + events.distinct_id = id_mappings.distinct_id |
| 40 | + WHERE |
| 41 | + DATE(time, 'America/Los_Angeles') >= '2025-08-01' |
| 42 | + AND DATE(time, 'America/Los_Angeles') < '2025-09-16' ) |
| 43 | +GROUP BY |
| 44 | + 1 |
| 45 | +ORDER BY |
| 46 | + 1 ASC |
| 47 | +``` |
| 48 | + |
| 49 | +### Top 20 events by volume |
| 50 | + |
| 51 | +```sql |
| 52 | +SELECT |
| 53 | + event_name, |
| 54 | + COUNT(*) AS event_count |
| 55 | +FROM |
| 56 | + `<your dataset>.mp_master_event` |
| 57 | +WHERE |
| 58 | + DATE(time, 'America/Los_Angeles') >= '2025-08-01' |
| 59 | + AND DATE(time, 'America/Los_Angeles') < '2025-09-16' |
| 60 | +GROUP BY |
| 61 | + 1 |
| 62 | +ORDER BY |
| 63 | + 2 DESC |
| 64 | +LIMIT |
| 65 | + 20 |
| 66 | +``` |
| 67 | + |
| 68 | +### Querying duplicate events |
| 69 | + |
| 70 | +Raw exported events can contain duplicates. You can use these 4 event properties to identify duplicates: `event_name`, `time`, `distinct_id`, and `insert_id` (docs [here](https://developer.mixpanel.com/reference/event-deduplication)). This is an example of a query you can use to identify duplicate events in your raw data. |
| 71 | + |
| 72 | +```sql |
| 73 | +SELECT |
| 74 | + *, |
| 75 | + COUNT(*) OVER (PARTITION BY event_name, time, distinct_id, insert_id ) AS dup_group_size |
| 76 | +FROM |
| 77 | + `<your dataset>.mp_master_event` |
| 78 | +WHERE |
| 79 | + DATE(time, 'America/Los_Angeles') >= '2025-08-01' |
| 80 | + AND DATE(time, 'America/Los_Angeles') < '2025-09-16' |
| 81 | +QUALIFY |
| 82 | + dup_group_size > 1 |
| 83 | +ORDER BY |
| 84 | + DATE(time, 'America/Los_Angeles'), |
| 85 | + event_name, |
| 86 | + time |
| 87 | + ``` |
| 88 | + |
0 commit comments