Skip to content

Commit de2876d

Browse files
authored
Xyninja/pipelines 5 (#2182)
* Added a new page on common SQL queries * Added a link to common SQL queries page * Updated titles and descriptions for common SQL queries * Removed expanding `properties` JSON query
1 parent fa9cdb5 commit de2876d

File tree

3 files changed

+90
-1
lines changed

3 files changed

+90
-1
lines changed

pages/docs/data-pipelines.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Discrepancies between the event counts in Mixpanel and those exported to your de
109109

110110
### What timezone is the data exported in?
111111

112-
The data is exported in UTC timezone. You’ll need to convert it to your project’s timezone when running queries in your warehouse.
112+
The data is exported in UTC timezone. You’ll need to convert it to your project’s timezone when running queries in your warehouse. Please refer to [this page](/docs/data-pipelines/common-sql-queries) for some common SQL queries.
113113

114114
### How can I count events exported by Mixpanel in the warehouse?
115115

pages/docs/data-pipelines/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
export default {
22
"json-pipelines": "Json Pipelines",
3+
"common-sql-queries": "Common SQL Queries",
34
"integrations": "Integrations",
45
"old-pipelines": "Older Version"
56
}
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Common SQL Queries
2+
3+
### Total daily count with deduplication logic and timezone adjustment
4+
5+
Events exported via pipelines (i.e. raw exports) can contain duplicates. Deduplication should be performed using 4 event properties: `event_name`, `time`, `distinct_id`, and `insert_id` (docs [here](https://developer.mixpanel.com/reference/event-deduplication)). This is an example of a total daily count, converted to a specific timezone and deduplicated.
6+
7+
```sql
8+
SELECT
9+
DATE(time, 'America/Los_Angeles') AS event_date,
10+
COUNT(DISTINCT CONCAT(event_name, time, distinct_id, insert_id)) AS event_count,
11+
FROM
12+
`<your dataset>.mp_master_event`
13+
WHERE
14+
DATE(time, 'America/Los_Angeles') >= '2025-08-01'
15+
AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
16+
GROUP BY
17+
1
18+
ORDER BY
19+
1 ASC
20+
```
21+
22+
### Unique user count with user ID resolution
23+
24+
Raw events may contain the original `distinct_id` associated with the user at the time of the event instead of the final canonical `distinct_id` for the user after authentication. The `mp_identity_mappings_data_view` contains mappings of the original `distinct_id`s to the resolved ones (i.e. canonical `distinct_id`s). You can use this mapping to make sure that the unique users calculations account for ID management and therefore more accurate.
25+
26+
```sql
27+
SELECT
28+
DATE(time, 'America/Los_Angeles') AS event_date,
29+
COUNT(DISTINCT resolved_user_id) AS unique_users
30+
FROM (
31+
SELECT
32+
time,
33+
IFNULL(id_mappings.resolved_distinct_id, events.distinct_id) AS resolved_user_id
34+
FROM
35+
`<your dataset>.mp_master_event` AS events
36+
LEFT JOIN
37+
`<your dataset>.mp_identity_mappings_data_view` AS id_mappings
38+
ON
39+
events.distinct_id = id_mappings.distinct_id
40+
WHERE
41+
DATE(time, 'America/Los_Angeles') >= '2025-08-01'
42+
AND DATE(time, 'America/Los_Angeles') < '2025-09-16' )
43+
GROUP BY
44+
1
45+
ORDER BY
46+
1 ASC
47+
```
48+
49+
### Top 20 events by volume
50+
51+
```sql
52+
SELECT
53+
event_name,
54+
COUNT(*) AS event_count
55+
FROM
56+
`<your dataset>.mp_master_event`
57+
WHERE
58+
DATE(time, 'America/Los_Angeles') >= '2025-08-01'
59+
AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
60+
GROUP BY
61+
1
62+
ORDER BY
63+
2 DESC
64+
LIMIT
65+
20
66+
```
67+
68+
### Querying duplicate events
69+
70+
Raw exported events can contain duplicates. You can use these 4 event properties to identify duplicates: `event_name`, `time`, `distinct_id`, and `insert_id` (docs [here](https://developer.mixpanel.com/reference/event-deduplication)). This is an example of a query you can use to identify duplicate events in your raw data.
71+
72+
```sql
73+
SELECT
74+
*,
75+
COUNT(*) OVER (PARTITION BY event_name, time, distinct_id, insert_id ) AS dup_group_size
76+
FROM
77+
`<your dataset>.mp_master_event`
78+
WHERE
79+
DATE(time, 'America/Los_Angeles') >= '2025-08-01'
80+
AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
81+
QUALIFY
82+
dup_group_size > 1
83+
ORDER BY
84+
DATE(time, 'America/Los_Angeles'),
85+
event_name,
86+
time
87+
```
88+

0 commit comments

Comments
 (0)