Skip to content

Conversation

wwyc
Copy link
Contributor

@wwyc wwyc commented Aug 19, 2025

Description

This PR is created as a draft for now and not ready for review. It is set up for initializing the city seen tables, pulling first and last city and subdivision fields from stable tables. The initialization code will not be applicable once the city and subdivision fields are nulled out in the stable tables.

  • Additional clauses were added to limit results output: sample_id = 0
  • The generated SQL files would not be checked in

Related Tickets & Documents

Reviewer, please follow this checklist

@wwyc wwyc requested a review from a team as a code owner August 19, 2025 23:01
@wwyc wwyc marked this pull request as draft August 19, 2025 23:02
@wwyc wwyc requested a review from BenWu August 19, 2025 23:07
@dataops-ci-bot

This comment has been minimized.

clients_city_first_seen_firefox_desktop AS (
SELECT
client_id,
first_seen_date AS first_seen_geo_date,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See question above) - first_seen_date and the city captured on that date vs. first city seen and on which date.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@wwyc wwyc changed the title DO-2075 Added fenix and desktop city seen initial table - DO NOT MERGE DO-2075 Added fenix and desktop city seen initial table - DO NOT MERGE (DRAFT) Aug 27, 2025
@dataops-ci-bot

This comment has been minimized.

@wwyc wwyc force-pushed the do-2075-initialize-client-city-seen branch from 64349bd to dc75f81 Compare August 28, 2025 21:29
@dataops-ci-bot

This comment has been minimized.

@wwyc wwyc requested a review from BenWu August 28, 2025 22:31
@wwyc wwyc force-pushed the do-2075-initialize-client-city-seen branch from ce4fe26 to ff85557 Compare August 29, 2025 19:16
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot
Copy link

Integration report for "Added table name to query"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/dags/: bqetl_clients_city_seen.py
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_clients_city_seen.py /tmp/workspace/generated-sql/dags/bqetl_clients_city_seen.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_clients_city_seen.py	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_clients_city_seen.py	2025-08-30 01:01:39.000000000 +0000
@@ -0,0 +1,96 @@
+# Generated via https://github.com/mozilla/bigquery-etl/blob/main/bigquery_etl/query_scheduling/generate_airflow_dags.py
+
+from airflow import DAG
+from airflow.sensors.external_task import ExternalTaskMarker
+from airflow.sensors.external_task import ExternalTaskSensor
+from airflow.utils.task_group import TaskGroup
+import datetime
+from operators.gcp_container_operator import GKEPodOperator
+from utils.constants import ALLOWED_STATES, FAILED_STATES
+from utils.gcp import bigquery_etl_query, bigquery_dq_check, bigquery_bigeye_check
+
+docs = """
+### bqetl_clients_city_seen
+
+Built from bigquery-etl repo, [`dags/bqetl_clients_city_seen.py`](https://github.com/mozilla/bigquery-etl/blob/generated-sql/dags/bqetl_clients_city_seen.py)
+
+#### Description
+
+Scheduled queries for client city seen tables
+#### Owner
+
+[email protected]
+
+#### Tags
+
+* impact/tier_2
+* repo/bigquery-etl
+"""
+
+
+default_args = {
+    "owner": "[email protected]",
+    "start_date": datetime.datetime(2025, 8, 25, 0, 0),
+    "end_date": None,
+    "email": ["[email protected]", "[email protected]"],
+    "depends_on_past": False,
+    "retry_delay": datetime.timedelta(seconds=1800),
+    "email_on_failure": True,
+    "email_on_retry": True,
+    "retries": 2,
+    "max_active_tis_per_dag": None,
+}
+
+tags = ["impact/tier_2", "repo/bigquery-etl"]
+
+with DAG(
+    "bqetl_clients_city_seen",
+    default_args=default_args,
+    schedule_interval="0 4 * * *",
+    doc_md=docs,
+    tags=tags,
+    catchup=False,
+) as dag:
+
+    wait_for_copy_deduplicate_all = ExternalTaskSensor(
+        task_id="wait_for_copy_deduplicate_all",
+        external_dag_id="copy_deduplicate",
+        external_task_id="copy_deduplicate_all",
+        execution_delta=datetime.timedelta(seconds=3600),
+        check_existence=True,
+        mode="reschedule",
+        poke_interval=datetime.timedelta(minutes=5),
+        allowed_states=ALLOWED_STATES,
+        failed_states=FAILED_STATES,
+        pool="DATA_ENG_EXTERNALTASKSENSOR",
+    )
+
+    fenix_baseline_clients_city_seen_v1 = bigquery_etl_query(
+        task_id="fenix_baseline_clients_city_seen_v1",
+        destination_table="baseline_clients_city_seen_v1",
+        dataset_id="fenix_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=["[email protected]", "[email protected]"],
+        date_partition_parameter=None,
+        depends_on_past=True,
+        parameters=["submission_date:DATE:{{ds}}"],
+    )
+
+    firefox_desktop_baseline_clients_city_seen_v1 = bigquery_etl_query(
+        task_id="firefox_desktop_baseline_clients_city_seen_v1",
+        destination_table="baseline_clients_city_seen_v1",
+        dataset_id="firefox_desktop_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=["[email protected]", "[email protected]"],
+        date_partition_parameter=None,
+        depends_on_past=True,
+        parameters=["submission_date:DATE:{{ds}}"],
+    )
+
+    fenix_baseline_clients_city_seen_v1.set_upstream(wait_for_copy_deduplicate_all)
+
+    firefox_desktop_baseline_clients_city_seen_v1.set_upstream(
+        wait_for_copy_deduplicate_all
+    )
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived: baseline_clients_city_seen_v1
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived: baseline_clients_city_seen_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_backend_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_backend_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_backend_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_backend_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:25.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_cirrus_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_cirrus_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_cirrus_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_cirrus_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_frontend_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_frontend_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/accounts_frontend_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/accounts_frontend_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/bedrock_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/bedrock_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/bedrock_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/bedrock_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -123,7 +123,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/burnham_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/burnham_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/burnham_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/burnham_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates/schema.yaml	2025-08-30 00:44:34.000000000 +0000
@@ -1,49 +1,49 @@
 fields:
-- mode: NULLABLE
-  name: submission_date
+- name: submission_date
   type: DATE
-- mode: NULLABLE
-  name: source
+  mode: NULLABLE
+- name: source
   type: STRING
-- mode: NULLABLE
-  name: event_type
+  mode: NULLABLE
+- name: event_type
   type: STRING
-- mode: NULLABLE
-  name: form_factor
+  mode: NULLABLE
+- name: form_factor
   type: STRING
-- mode: NULLABLE
-  name: country
+  mode: NULLABLE
+- name: country
   type: STRING
-- mode: NULLABLE
-  name: subdivision1
+  mode: NULLABLE
+- name: subdivision1
   type: STRING
-- mode: NULLABLE
-  name: advertiser
+  mode: NULLABLE
+- name: advertiser
   type: STRING
-- mode: NULLABLE
-  name: release_channel
+  mode: NULLABLE
+- name: release_channel
   type: STRING
-- mode: NULLABLE
-  name: position
+  mode: NULLABLE
+- name: position
   type: INTEGER
-- mode: NULLABLE
-  name: provider
+  mode: NULLABLE
+- name: provider
   type: STRING
-- mode: NULLABLE
-  name: match_type
+  mode: NULLABLE
+- name: match_type
   type: STRING
-- mode: NULLABLE
-  name: normalized_os
+  mode: NULLABLE
+- name: normalized_os
   type: STRING
-- mode: NULLABLE
-  name: suggest_data_sharing_enabled
+  mode: NULLABLE
+- name: suggest_data_sharing_enabled
   type: BOOLEAN
-- mode: NULLABLE
-  name: event_count
+  mode: NULLABLE
+- name: event_count
   type: INTEGER
-- mode: NULLABLE
-  name: user_count
+  mode: NULLABLE
+- name: user_count
   type: INTEGER
-- mode: NULLABLE
-  name: query_type
+  mode: NULLABLE
+- name: query_type
   type: STRING
+  mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates_suggest/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates_suggest/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates_suggest/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/contextual_services/event_aggregates_suggest/schema.yaml	2025-08-30 00:44:26.000000000 +0000
@@ -1,40 +1,40 @@
 fields:
-- mode: NULLABLE
-  name: submission_date
+- name: submission_date
   type: DATE
-- mode: NULLABLE
-  name: form_factor
+  mode: NULLABLE
+- name: form_factor
   type: STRING
-- mode: NULLABLE
-  name: country
+  mode: NULLABLE
+- name: country
   type: STRING
-- mode: NULLABLE
-  name: advertiser
+  mode: NULLABLE
+- name: advertiser
   type: STRING
-- mode: NULLABLE
-  name: normalized_os
+  mode: NULLABLE
+- name: normalized_os
   type: STRING
-- mode: NULLABLE
-  name: release_channel
+  mode: NULLABLE
+- name: release_channel
   type: STRING
-- mode: NULLABLE
-  name: position
+  mode: NULLABLE
+- name: position
   type: INTEGER
-- mode: NULLABLE
-  name: provider
+  mode: NULLABLE
+- name: provider
   type: STRING
-- mode: NULLABLE
-  name: match_type
+  mode: NULLABLE
+- name: match_type
   type: STRING
-- mode: NULLABLE
-  name: suggest_data_sharing_enabled
+  mode: NULLABLE
+- name: suggest_data_sharing_enabled
   type: BOOLEAN
-- mode: NULLABLE
-  name: impression_count
+  mode: NULLABLE
+- name: impression_count
   type: INTEGER
-- mode: NULLABLE
-  name: click_count
+  mode: NULLABLE
+- name: click_count
   type: INTEGER
-- mode: NULLABLE
-  name: query_type
+  mode: NULLABLE
+- name: query_type
   type: STRING
+  mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/debug_ping_view_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/debug_ping_view_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/debug_ping_view_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/debug_ping_view_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/experimenter_cirrus_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/experimenter_cirrus_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/experimenter_cirrus_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/experimenter_cirrus_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/broken_site_report/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/broken_site_report/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/broken_site_report/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/broken_site_report/metadata.yaml	2025-08-30 00:54:47.000000000 +0000
@@ -1,6 +1,10 @@
-friendly_name: Broken Site Report
+friendly_name: App-specific view for Glean ping "broken-site-report"
 description: |-
-  Please provide a description for the query
+  This a view that UNIONs the stable ping tables
+  across all channels of the Glean application "Firefox for Android"
+  (org_mozilla_firefox.broken_site_report, org_mozilla_firefox_beta.broken_site_report, org_mozilla_fenix.broken_site_report, org_mozilla_fenix_nightly.broken_site_report, org_mozilla_fennec_aurora.broken_site_report).
+
+  It is used by Looker.
 owners: []
 labels: {}
 bigquery: null
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/crash/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/crash/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/crash/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/crash/metadata.yaml	2025-08-30 00:54:47.000000000 +0000
@@ -1,6 +1,10 @@
-friendly_name: Crash
+friendly_name: App-specific view for Glean ping "crash"
 description: |-
-  Please provide a description for the query
+  This a view that UNIONs the stable ping tables
+  across all channels of the Glean application "Firefox for Android"
+  (org_mozilla_firefox.crash, org_mozilla_firefox_beta.crash, org_mozilla_fenix.crash, org_mozilla_fenix_nightly.crash, org_mozilla_fennec_aurora.crash).
+
+  It is used by Looker.
 owners: []
 labels: {}
 bigquery: null
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_clients/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_clients/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_clients/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_clients/schema.yaml	2025-08-30 00:43:41.000000000 +0000
@@ -26,6 +26,9 @@
 - name: adjust_network
   type: STRING
   mode: NULLABLE
+- name: install_source
+  type: STRING
+  mode: NULLABLE
 - name: retained_week_2
   type: BOOLEAN
   mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_week_4/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_week_4/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_week_4/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix/funnel_retention_week_4/schema.yaml	2025-08-30 00:43:52.000000000 +0000
@@ -48,6 +48,9 @@
   description: 'The type of source of a client installation.
 
     '
+- name: install_source
+  type: STRING
+  mode: NULLABLE
 - name: new_profiles
   type: INTEGER
   mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/metadata.yaml	2025-08-30 00:54:48.000000000 +0000
@@ -0,0 +1,49 @@
+friendly_name: Baseline Clients City Seen
+description: |-
+  This table stores the first-seen and last-seen geo attributes for each client_id.
+  The table was initialized from stable tables (with ~2 years of retention), so the initial dates reflect the earliest/latest
+  observations within that window.  It then updates daily using live tables, updating last-seen fields for existing
+  clients and adding new clients with first-seen geo attributes that do not yet exist in the table.
+  This table should not be backfilled beyond 30 days which is the maximum retention of live tables.
+
+  Implementation Plan: https://docs.google.com/document/d/1S8yVEwJjtJy3Pd8cn_BHRhgxllVnoxKylGCpuMyqWNQ
+  Data Model Design:  https://docs.google.com/document/d/1i5SkUC5waiZGWEPu7elDIaY3IxD5jNZgJy1moAKn7g0
+owners:
+- [email protected]
+labels:
+  incremental: true
+  schedule: daily
+  table_type: client_level
+  dag: bqetl_clients_city_seen
+  owner1: wichan
+scheduling:
+  dag_name: bqetl_clients_city_seen
+  task_name: fenix_baseline_clients_city_seen_v1
+  depends_on_past: true
+  date_partition_parameter: null
+  parameters:
+  - submission_date:DATE:{{ds}}
+  depends_on:
+  - task_id: copy_deduplicate_all
+    dag_name: copy_deduplicate
+    execution_delta: 1h
+bigquery:
+  time_partitioning:
+    type: day
+    field: last_seen_geo_date
+    require_partition_filter: false
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - sample_id
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  query.sql:
+  - moz-fx-data-shared-prod.fenix_derived.baseline_clients_city_seen_v1
+  - moz-fx-data-shared-prod.org_mozilla_fenix_nightly_live.baseline_v1
+  - moz-fx-data-shared-prod.org_mozilla_firefox_live.baseline_v1
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/query.sql	2025-08-30 00:43:08.000000000 +0000
@@ -0,0 +1,513 @@
+ -- Query generated via sql_generators.clients_city_seen.
+ -- this mimics the logic used in baseline_clients_daily_v1.
+{% if is_init() %}
+  WITH base_org_mozilla_firefox AS (
+    SELECT
+      submission_timestamp,
+      DATE(submission_timestamp) AS submission_date,
+      LOWER(client_info.client_id) AS client_id,
+      sample_id,
+      mozfun.glean.parse_datetime(ping_info.end_time) AS parsed_end_time,
+      `moz-fx-data-shared-prod.udf.glean_timespan_seconds`(
+        metrics.timespan.glean_baseline_duration
+      ) AS duration,
+      metadata.geo.city,
+      metadata.geo.subdivision1 AS geo_subdivision1,
+      metadata.geo.subdivision2 AS geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.org_mozilla_firefox_stable.baseline_v1`
+    WHERE
+      client_info.client_id IS NOT NULL
+      AND sample_id = 0
+      AND DATE(submission_timestamp) <= "2025-08-25"
+  ),
+  with_dates_org_mozilla_firefox AS (
+    SELECT
+      *,
+      DATE(SAFE.TIMESTAMP_SUB(parsed_end_time, INTERVAL duration SECOND)) AS session_start_date,
+      DATE(parsed_end_time) AS session_end_date,
+    FROM
+      base_org_mozilla_firefox
+  ),
+  with_date_offsets_org_mozilla_firefox AS (
+    SELECT
+      *,
+      DATE_DIFF(submission_date, session_start_date, DAY) AS session_start_date_offset,
+      DATE_DIFF(submission_date, session_end_date, DAY) AS session_end_date_offset,
+    FROM
+      with_dates_org_mozilla_firefox
+  ),
+  overactive_org_mozilla_firefox AS (
+    SELECT
+      submission_date,
+      client_id
+    FROM
+      with_date_offsets_org_mozilla_firefox
+    WHERE
+      submission_date >= '2018-01-01'
+    GROUP BY
+      submission_date,
+      client_id
+    HAVING
+      COUNT(*) > 150000
+  ),
+  windowed_org_mozilla_firefox AS (
+    SELECT
+      submission_date,
+      client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      `moz-fx-data-shared-prod.udf.mode_last`(ARRAY_AGG(city) OVER w1) AS city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision1) OVER w1
+      ) AS geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision2) OVER w1
+      ) AS geo_subdivision2,
+    FROM
+      with_date_offsets_org_mozilla_firefox
+    LEFT JOIN
+      overactive_org_mozilla_firefox
+      USING (submission_date, client_id)
+    WHERE
+      overactive_org_mozilla_firefox.client_id IS NULL
+      AND submission_date >= '2018-01-01'
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  clients_daily_org_mozilla_firefox AS (
+    SELECT
+      cd.* EXCEPT (_n),
+    FROM
+      windowed_org_mozilla_firefox AS cd
+    WHERE
+      _n = 1
+  ),
+  clients_city_first_seen_org_mozilla_firefox AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS first_seen_geo_date,
+      city AS first_seen_geo_city,
+      geo_subdivision1 AS first_seen_geo_subdivision1,
+      geo_subdivision2 AS first_seen_geo_subdivision2,
+    FROM
+      clients_daily_org_mozilla_firefox
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date) = 1
+  ),
+  clients_city_last_seen_org_mozilla_firefox AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS last_seen_geo_date,
+      city AS last_seen_geo_city,
+      geo_subdivision1 AS last_seen_geo_subdivision1,
+      geo_subdivision2 AS last_seen_geo_subdivision2,
+    FROM
+      clients_daily_org_mozilla_firefox
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date DESC) = 1
+  ),
+  base_org_mozilla_fenix_nightly AS (
+    SELECT
+      submission_timestamp,
+      DATE(submission_timestamp) AS submission_date,
+      LOWER(client_info.client_id) AS client_id,
+      sample_id,
+      mozfun.glean.parse_datetime(ping_info.end_time) AS parsed_end_time,
+      `moz-fx-data-shared-prod.udf.glean_timespan_seconds`(
+        metrics.timespan.glean_baseline_duration
+      ) AS duration,
+      metadata.geo.city,
+      metadata.geo.subdivision1 AS geo_subdivision1,
+      metadata.geo.subdivision2 AS geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.org_mozilla_fenix_nightly_stable.baseline_v1`
+    WHERE
+      client_info.client_id IS NOT NULL
+      AND sample_id = 0
+      AND DATE(submission_timestamp) <= "2025-08-25"
+  ),
+  with_dates_org_mozilla_fenix_nightly AS (
+    SELECT
+      *,
+      DATE(SAFE.TIMESTAMP_SUB(parsed_end_time, INTERVAL duration SECOND)) AS session_start_date,
+      DATE(parsed_end_time) AS session_end_date,
+    FROM
+      base_org_mozilla_fenix_nightly
+  ),
+  with_date_offsets_org_mozilla_fenix_nightly AS (
+    SELECT
+      *,
+      DATE_DIFF(submission_date, session_start_date, DAY) AS session_start_date_offset,
+      DATE_DIFF(submission_date, session_end_date, DAY) AS session_end_date_offset,
+    FROM
+      with_dates_org_mozilla_fenix_nightly
+  ),
+  overactive_org_mozilla_fenix_nightly AS (
+    SELECT
+      submission_date,
+      client_id
+    FROM
+      with_date_offsets_org_mozilla_fenix_nightly
+    WHERE
+      submission_date >= '2018-01-01'
+    GROUP BY
+      submission_date,
+      client_id
+    HAVING
+      COUNT(*) > 150000
+  ),
+  windowed_org_mozilla_fenix_nightly AS (
+    SELECT
+      submission_date,
+      client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      `moz-fx-data-shared-prod.udf.mode_last`(ARRAY_AGG(city) OVER w1) AS city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision1) OVER w1
+      ) AS geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision2) OVER w1
+      ) AS geo_subdivision2,
+    FROM
+      with_date_offsets_org_mozilla_fenix_nightly
+    LEFT JOIN
+      overactive_org_mozilla_fenix_nightly
+      USING (submission_date, client_id)
+    WHERE
+      overactive_org_mozilla_fenix_nightly.client_id IS NULL
+      AND submission_date >= '2018-01-01'
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  clients_daily_org_mozilla_fenix_nightly AS (
+    SELECT
+      cd.* EXCEPT (_n),
+    FROM
+      windowed_org_mozilla_fenix_nightly AS cd
+    WHERE
+      _n = 1
+  ),
+  clients_city_first_seen_org_mozilla_fenix_nightly AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS first_seen_geo_date,
+      city AS first_seen_geo_city,
+      geo_subdivision1 AS first_seen_geo_subdivision1,
+      geo_subdivision2 AS first_seen_geo_subdivision2,
+    FROM
+      clients_daily_org_mozilla_fenix_nightly
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date) = 1
+  ),
+  clients_city_last_seen_org_mozilla_fenix_nightly AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS last_seen_geo_date,
+      city AS last_seen_geo_city,
+      geo_subdivision1 AS last_seen_geo_subdivision1,
+      geo_subdivision2 AS last_seen_geo_subdivision2,
+    FROM
+      clients_daily_org_mozilla_fenix_nightly
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date DESC) = 1
+  )
+  SELECT
+    "org_mozilla_firefox" AS app_id,
+    COALESCE(cfs.client_id, cls.client_id) AS client_id,
+    COALESCE(cfs.sample_id, cls.sample_id) AS sample_id,
+    first_seen_geo_date,
+    first_seen_geo_city,
+    first_seen_geo_subdivision1,
+    first_seen_geo_subdivision2,
+    last_seen_geo_date,
+    last_seen_geo_city,
+    last_seen_geo_subdivision1,
+    last_seen_geo_subdivision2,
+  FROM
+    clients_city_first_seen_org_mozilla_firefox cfs
+  FULL OUTER JOIN
+    clients_city_last_seen_org_mozilla_firefox cls
+    ON cfs.client_id = cls.client_id
+    AND cfs.sample_id = cls.sample_id
+  UNION ALL
+  SELECT
+    "org_mozilla_fenix_nightly" AS app_id,
+    COALESCE(cfs.client_id, cls.client_id) AS client_id,
+    COALESCE(cfs.sample_id, cls.sample_id) AS sample_id,
+    first_seen_geo_date,
+    first_seen_geo_city,
+    first_seen_geo_subdivision1,
+    first_seen_geo_subdivision2,
+    last_seen_geo_date,
+    last_seen_geo_city,
+    last_seen_geo_subdivision1,
+    last_seen_geo_subdivision2,
+  FROM
+    clients_city_first_seen_org_mozilla_fenix_nightly cfs
+  FULL OUTER JOIN
+    clients_city_last_seen_org_mozilla_fenix_nightly cls
+    ON cfs.client_id = cls.client_id
+    AND cfs.sample_id = cls.sample_id
+{% else %}
+  WITH _previous_org_mozilla_firefox AS (
+    SELECT
+      *
+    FROM
+      `moz-fx-data-shared-prod.fenix_derived.baseline_clients_city_seen_v1`
+    WHERE
+      app_id = "org_mozilla_firefox"
+  ),
+  _current_windowed_org_mozilla_firefox AS (
+    SELECT
+      "org_mozilla_firefox" AS app_id,
+      client_info.client_id AS client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      @submission_date AS first_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS first_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS first_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS first_seen_geo_subdivision2,
+      @submission_date AS last_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS last_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS last_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS last_seen_geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.org_mozilla_firefox_live.baseline_v1`
+    WHERE
+      DATE(submission_timestamp) = @submission_date
+      AND sample_id = 0
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  _current_org_mozilla_firefox AS (
+    SELECT
+      cw.* EXCEPT (_n),
+    FROM
+      _current_windowed_org_mozilla_firefox AS cw
+    WHERE
+      _n = 1
+  ),
+  _previous_org_mozilla_fenix_nightly AS (
+    SELECT
+      *
+    FROM
+      `moz-fx-data-shared-prod.fenix_derived.baseline_clients_city_seen_v1`
+    WHERE
+      app_id = "org_mozilla_fenix_nightly"
+  ),
+  _current_windowed_org_mozilla_fenix_nightly AS (
+    SELECT
+      "org_mozilla_fenix_nightly" AS app_id,
+      client_info.client_id AS client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      @submission_date AS first_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS first_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS first_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS first_seen_geo_subdivision2,
+      @submission_date AS last_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS last_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS last_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS last_seen_geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.org_mozilla_fenix_nightly_live.baseline_v1`
+    WHERE
+      DATE(submission_timestamp) = @submission_date
+      AND sample_id = 0
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  _current_org_mozilla_fenix_nightly AS (
+    SELECT
+      cw.* EXCEPT (_n),
+    FROM
+      _current_windowed_org_mozilla_fenix_nightly AS cw
+    WHERE
+      _n = 1
+  )
+  SELECT
+    app_id,
+    client_id,
+    sample_id,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_date, _p.first_seen_geo_date) AS first_seen_geo_date,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_city, _p.first_seen_geo_city) AS first_seen_geo_city,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision1,
+      _p.first_seen_geo_subdivision1
+    ) AS first_seen_geo_subdivision1,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision2,
+      _p.first_seen_geo_subdivision2
+    ) AS first_seen_geo_subdivision2,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_date,
+      _p.last_seen_geo_date
+    ) AS last_seen_geo_date,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_city,
+      _p.last_seen_geo_city
+    ) AS last_seen_geo_city,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_subdivision1,
+      _p.last_seen_geo_subdivision1
+    ) AS last_seen_geo_subdivision1,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_subdivision2,
+      _p.last_seen_geo_subdivision2
+    ) AS last_seen_geo_subdivision2,
+  FROM
+    _current_org_mozilla_firefox AS _c
+  FULL JOIN
+    _previous_org_mozilla_firefox AS _p
+    USING (client_id, sample_id, app_id)
+  UNION ALL
+  SELECT
+    app_id,
+    client_id,
+    sample_id,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_date, _p.first_seen_geo_date) AS first_seen_geo_date,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_city, _p.first_seen_geo_city) AS first_seen_geo_city,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision1,
+      _p.first_seen_geo_subdivision1
+    ) AS first_seen_geo_subdivision1,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision2,
+      _p.first_seen_geo_subdivision2
+    ) AS first_seen_geo_subdivision2,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_date,
+      _p.last_seen_geo_date
+    ) AS last_seen_geo_date,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_city,
+      _p.last_seen_geo_city
+    ) AS last_seen_geo_city,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_subdivision1,
+      _p.last_seen_geo_subdivision1
+    ) AS last_seen_geo_subdivision1,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_subdivision2,
+      _p.last_seen_geo_subdivision2
+    ) AS last_seen_geo_subdivision2,
+  FROM
+    _current_org_mozilla_fenix_nightly AS _c
+  FULL JOIN
+    _previous_org_mozilla_fenix_nightly AS _p
+    USING (client_id, sample_id, app_id)
+{% endif %}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/fenix_derived/baseline_clients_city_seen_v1/schema.yaml	2025-08-30 00:43:08.000000000 +0000
@@ -0,0 +1,45 @@
+fields:
+- name: app_id
+  type: STRING
+  mode: NULLABLE
+  description: ID of the browser application.
+- name: client_id
+  type: STRING
+  mode: NULLABLE
+  description: A UUID uniquely identifying the client.
+- name: sample_id
+  type: INTEGER
+  mode: NULLABLE
+  description: A number, 0-99, that samples by client_id and allows filtering data for analysis.
+- name: first_seen_geo_date
+  type: DATE
+  mode: NULLABLE
+  description: Date when the first seen geo fields were captured.
+- name: first_seen_geo_city
+  type: STRING
+  mode: NULLABLE
+  description: City captured on first_seen_geo_date.
+- name: first_seen_geo_subdivision1
+  type: STRING
+  mode: NULLABLE
+  description: Major country subdivision, typically a state, province, or county captured on first_seen_geo_date.
+- name: first_seen_geo_subdivision2
+  type: STRING
+  mode: NULLABLE
+  description: Second major country subdivision; not applicable for most countries captured on first_seen_geo_date.
+- name: last_seen_geo_date
+  type: DATE
+  mode: NULLABLE
+  description: Date when the last seen geo fields were captured.
+- name: last_seen_geo_city
+  type: STRING
+  mode: NULLABLE
+  description: City captured on last_seen_geo_city.
+- name: last_seen_geo_subdivision1
+  type: STRING
+  mode: NULLABLE
+  description: Major country subdivision, typically a state, province, or county captured on last_seen_geo_date.
+- name: last_seen_geo_subdivision2
+  type: STRING
+  mode: NULLABLE
+  description: Second major country subdivision; not applicable for most countries captured on last_seen_geo_date.
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter/crash/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter/crash/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter/crash/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter/crash/metadata.yaml	2025-08-30 00:54:38.000000000 +0000
@@ -1,6 +1,14 @@
-friendly_name: Crash
+friendly_name: Historical Pings for `firefox-crashreporter/crash`
 description: |-
-  Please provide a description for the query
+  A historical view of pings sent for the
+  `firefox-crashreporter/crash`
+  document type.
+
+  This view is guaranteed to contain only complete days
+  (per `submission_timestamp`)
+  and to contain only one row per distinct `document_id` within a given date.
+
+  Clustering fields: `normalized_channel`, `sample_id`
 owners: []
 labels:
   authorized: true
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_crashreporter_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/broken_site_report/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/broken_site_report/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/broken_site_report/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/broken_site_report/metadata.yaml	2025-08-30 00:54:53.000000000 +0000
@@ -1,6 +1,14 @@
-friendly_name: Broken Site Report
+friendly_name: Historical Pings for `firefox-desktop/broken-site-report`
 description: |-
-  Please provide a description for the query
+  A historical view of pings sent for the
+  `firefox-desktop/broken-site-report`
+  document type.
+
+  This view is guaranteed to contain only complete days
+  (per `submission_timestamp`)
+  and to contain only one row per distinct `document_id` within a given date.
+
+  Clustering fields: `normalized_channel`, `sample_id`
 owners: []
 labels:
   authorized: true
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/crash/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/crash/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/crash/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/crash/metadata.yaml	2025-08-30 00:54:53.000000000 +0000
@@ -1,6 +1,14 @@
-friendly_name: Crash
+friendly_name: Historical Pings for `firefox-desktop/crash`
 description: |-
-  Please provide a description for the query
+  A historical view of pings sent for the
+  `firefox-desktop/crash`
+  document type.
+
+  This view is guaranteed to contain only complete days
+  (per `submission_timestamp`)
+  and to contain only one row per distinct `document_id` within a given date.
+
+  Clustering fields: `normalized_channel`, `sample_id`
 owners: []
 labels:
   authorized: true
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/ltv_states/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/ltv_states/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/ltv_states/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/ltv_states/schema.yaml	2025-08-30 00:44:50.000000000 +0000
@@ -1,64 +1,66 @@
 fields:
-- description: Unique ID for the client installation.
-  mode: NULLABLE
-  name: client_id
+- name: client_id
   type: STRING
-- description: Sample ID - A number ranging from 0 - 99 based on client ID; used to pull a small sample of data related to a subset of clients over time
   mode: NULLABLE
-  name: sample_id
-  type: INT64
-- description: Submission Date
+  description: Unique ID for the client installation.
+- name: sample_id
+  type: INTEGER
   mode: NULLABLE
-  name: submission_date
+  description: Sample ID - A number ranging from 0 - 99 based on client ID; used to
+    pull a small sample of data related to a subset of clients over time
+- name: submission_date
   type: DATE
-- description: First Seen Date - The date this client was first seen
   mode: NULLABLE
-  name: first_seen_date
+  description: Submission Date
+- name: first_seen_date
   type: DATE
-- description: Days Since First Seen - The number of days since the client was first seen
   mode: NULLABLE
-  name: days_since_first_seen
-  type: INT64
-- description: Days Since Active
+  description: First Seen Date - The date this client was first seen
+- name: days_since_first_seen
+  type: INTEGER
   mode: NULLABLE
-  name: days_since_active
-  type: INT64
-- description: First Reported Country - The country this client ID was first reported from
+  description: Days Since First Seen - The number of days since the client was first
+    seen
+- name: days_since_active
+  type: INTEGER
   mode: NULLABLE
-  name: first_reported_country
+  description: Days Since Active
+- name: first_reported_country
   type: STRING
-- description: Attribution
   mode: NULLABLE
-  name: attribution
+  description: First Reported Country - The country this client ID was first reported
+    from
+- name: attribution
   type: RECORD
+  mode: NULLABLE
   fields:
-  - mode: NULLABLE
-    name: source
+  - name: source
     type: STRING
+    mode: NULLABLE
     description: Attribution Source
-  - mode: NULLABLE
-    name: medium
+  - name: medium
     type: STRING
+    mode: NULLABLE
     description: Attribution Medium
-  - mode: NULLABLE
-    name: campaign
+  - name: campaign
     type: STRING
+    mode: NULLABLE
     description: Attribution Campaign
-  - mode: NULLABLE
-    name: content
+  - name: content
     type: STRING
+    mode: NULLABLE
     description: Attribution Content
-  - mode: NULLABLE
-    name: experiment
+  - name: experiment
     type: STRING
+    mode: NULLABLE
     description: Attribution Experiment
-  - mode: NULLABLE
-    name: variation
+  - name: variation
     type: STRING
+    mode: NULLABLE
     description: Attribution Variation
-  - mode: NULLABLE
-    name: dltoken
+  - name: dltoken
     type: STRING
+    mode: NULLABLE
     description: Attribution Download Token
   - name: dlsource
     type: STRING
@@ -68,40 +70,43 @@
     type: STRING
     mode: NULLABLE
     description: Attribution UA
-- description: Active
-  mode: NULLABLE
-  name: active
-  type: INT64
-- description: Ad Clicks - The number of ad clicks from this client on the submission date
+  description: Attribution
+- name: active
+  type: INTEGER
   mode: NULLABLE
-  name: ad_clicks
-  type: INT64
-- description: Total Historic Ad Clicks - The number of ad clicks from this client on or before the submission date
+  description: Active
+- name: ad_clicks
+  type: INTEGER
   mode: NULLABLE
-  name: total_historic_ad_clicks
-  type: INT64
-- description: Days Seen Bytes
+  description: Ad Clicks - The number of ad clicks from this client on the submission
+    date
+- name: total_historic_ad_clicks
+  type: INTEGER
   mode: NULLABLE
-  name: days_seen_bytes
+  description: Total Historic Ad Clicks - The number of ad clicks from this client
+    on or before the submission date
+- name: days_seen_bytes
   type: BYTES
-- description: Pattern
   mode: NULLABLE
-  name: pattern
+  description: Days Seen Bytes
+- name: pattern
   type: INTEGER
-- description: Death Time
   mode: NULLABLE
-  name: death_time
+  description: Pattern
+- name: death_time
   type: INTEGER
-- description: Max Days
   mode: NULLABLE
-  name: max_days
+  description: Death Time
+- name: max_days
   type: INTEGER
-- description: Markov States
   mode: NULLABLE
-  name: markov_states
+  description: Max Days
+- name: markov_states
   type: RECORD
-  fields:
-  - description: Desktop States V1
     mode: NULLABLE
-    name: desktop_states_v1
+  fields:
+  - name: desktop_states_v1
     type: STRING
+    mode: NULLABLE
+    description: Desktop States V1
+  description: Markov States
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab/metadata.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab/metadata.yaml	2025-08-30 00:54:53.000000000 +0000
@@ -1,6 +1,14 @@
-friendly_name: Newtab
+friendly_name: Historical Pings for `firefox-desktop/newtab`
 description: |-
-  Please provide a description for the query
+  A historical view of pings sent for the
+  `firefox-desktop/newtab`
+  document type.
+
+  This view is guaranteed to contain only complete days
+  (per `submission_timestamp`)
+  and to contain only one row per distinct `document_id` within a given date.
+
+  Clustering fields: `normalized_channel`, `sample_id`
 owners: []
 labels: {}
 bigquery: null
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab_live/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab_live/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab_live/schema.yaml	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop/newtab_live/schema.yaml	2025-08-30 00:44:16.000000000 +0000
@@ -1,48 +1,47 @@
 fields:
-- description: Submission Timestamp
-  mode: NULLABLE
-  name: submission_timestamp
+- name: submission_timestamp
   type: TIMESTAMP
-- description: Normalized Country Code, Examples - US, AR, BR, etc.
   mode: NULLABLE
-  name: normalized_country_code
+  description: Submission Timestamp
+- name: normalized_country_code
   type: STRING
-- description: Normalized Channel, Examples - release, nightly, aurora, esr, beta
   mode: NULLABLE
-  name: normalized_channel
+  description: Normalized Country Code, Examples - US, AR, BR, etc.
+- name: normalized_channel
   type: STRING
-- description: Document ID
   mode: NULLABLE
-  name: document_id
+  description: Normalized Channel, Examples - release, nightly, aurora, esr, beta
+- name: document_id
   type: STRING
-- description: Pocket Enabled
   mode: NULLABLE
-  name: pocket_enabled
+  description: Document ID
+- name: pocket_enabled
   type: BOOLEAN
-- description: Pocket Sponsored Stories Enabled
   mode: NULLABLE
-  name: pocket_sponsored_stories_enabled
+  description: Pocket Enabled
+- name: pocket_sponsored_stories_enabled
   type: BOOLEAN
-- description: Newtab Locale
   mode: NULLABLE
-  name: newtab_locale
+  description: Pocket Sponsored Stories Enabled
+- name: newtab_locale
   type: STRING
-- description: App Build
   mode: NULLABLE
-  name: app_build
+  description: Newtab Locale
+- name: app_build
   type: STRING
-- description: App Display Version
   mode: NULLABLE
-  name: app_display_version
+  description: App Build
+- name: app_display_version
   type: STRING
-- description: Client ID
   mode: NULLABLE
-  name: client_id
+  description: App Display Version
+- name: client_id
   type: STRING
+  mode: NULLABLE
+  description: Client ID
 - name: events
   type: RECORD
   mode: REPEATED
-  description: Events
   fields:
   - name: category
     type: STRING
@@ -51,7 +50,6 @@
   - name: extra
     type: RECORD
     mode: REPEATED
-    description: Extras
     fields:
     - name: key
       type: STRING
@@ -61,6 +59,7 @@
       type: STRING
       mode: NULLABLE
       description: Value
+    description: Extras
   - name: name
     type: STRING
     mode: NULLABLE
@@ -69,3 +68,4 @@
     type: INTEGER
     mode: NULLABLE
     description: Event Timestamp
+  description: Events
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_defaultagent_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_defaultagent_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_defaultagent_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_defaultagent_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_tasks_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_tasks_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_tasks_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_tasks_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -91,7 +91,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_update_derived/event_monitoring_live_v1/materialized_view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_update_derived/event_monitoring_live_v1/materialized_view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_update_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:36:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_background_update_derived/event_monitoring_live_v1/materialized_view.sql	2025-08-30 00:37:26.000000000 +0000
@@ -59,7 +59,7 @@
 FROM
   combined
 WHERE
-  DATE(submission_timestamp) >= "2025-08-29"
+  DATE(submission_timestamp) >= "2025-08-30"
 GROUP BY
   submission_date,
   window_start,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/metadata.yaml	2025-08-30 00:54:54.000000000 +0000
@@ -0,0 +1,48 @@
+friendly_name: Baseline Clients City Seen
+description: |-
+  This table stores the first-seen and last-seen geo attributes for each client_id.
+  The table was initialized from stable tables (with ~2 years of retention), so the initial dates reflect the earliest/latest
+  observations within that window.  It then updates daily using live tables, updating last-seen fields for existing
+  clients and adding new clients with first-seen geo attributes that do not yet exist in the table.
+  This table should not be backfilled beyond 30 days which is the maximum retention of live tables.
+
+  Implementation Plan: https://docs.google.com/document/d/1S8yVEwJjtJy3Pd8cn_BHRhgxllVnoxKylGCpuMyqWNQ
+  Data Model Design:  https://docs.google.com/document/d/1i5SkUC5waiZGWEPu7elDIaY3IxD5jNZgJy1moAKn7g0
+owners:
+- [email protected]
+labels:
+  incremental: true
+  schedule: daily
+  table_type: client_level
+  dag: bqetl_clients_city_seen
+  owner1: wichan
+scheduling:
+  dag_name: bqetl_clients_city_seen
+  task_name: firefox_desktop_baseline_clients_city_seen_v1
+  depends_on_past: true
+  date_partition_parameter: null
+  parameters:
+  - submission_date:DATE:{{ds}}
+  depends_on:
+  - task_id: copy_deduplicate_all
+    dag_name: copy_deduplicate
+    execution_delta: 1h
+bigquery:
+  time_partitioning:
+    type: day
+    field: last_seen_geo_date
+    require_partition_filter: false
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - sample_id
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  query.sql:
+  - moz-fx-data-shared-prod.firefox_desktop_derived.baseline_clients_city_seen_v1
+  - moz-fx-data-shared-prod.firefox_desktop_live.baseline_v1
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_city_seen_v1/query.sql	2025-08-30 00:43:08.000000000 +0000
@@ -0,0 +1,258 @@
+ -- Query generated via sql_generators.clients_city_seen.
+ -- this mimics the logic used in baseline_clients_daily_v1.
+{% if is_init() %}
+  WITH base_firefox_desktop AS (
+    SELECT
+      submission_timestamp,
+      DATE(submission_timestamp) AS submission_date,
+      LOWER(client_info.client_id) AS client_id,
+      sample_id,
+      mozfun.glean.parse_datetime(ping_info.end_time) AS parsed_end_time,
+      `moz-fx-data-shared-prod.udf.glean_timespan_seconds`(
+        metrics.timespan.glean_baseline_duration
+      ) AS duration,
+      metadata.geo.city,
+      metadata.geo.subdivision1 AS geo_subdivision1,
+      metadata.geo.subdivision2 AS geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.firefox_desktop_stable.baseline_v1`
+    WHERE
+      client_info.client_id IS NOT NULL
+      AND sample_id = 0
+      AND DATE(submission_timestamp) <= "2025-08-25"
+  ),
+  with_dates_firefox_desktop AS (
+    SELECT
+      *,
+      DATE(SAFE.TIMESTAMP_SUB(parsed_end_time, INTERVAL duration SECOND)) AS session_start_date,
+      DATE(parsed_end_time) AS session_end_date,
+    FROM
+      base_firefox_desktop
+  ),
+  with_date_offsets_firefox_desktop AS (
+    SELECT
+      *,
+      DATE_DIFF(submission_date, session_start_date, DAY) AS session_start_date_offset,
+      DATE_DIFF(submission_date, session_end_date, DAY) AS session_end_date_offset,
+    FROM
+      with_dates_firefox_desktop
+  ),
+  overactive_firefox_desktop AS (
+    SELECT
+      submission_date,
+      client_id
+    FROM
+      with_date_offsets_firefox_desktop
+    WHERE
+      submission_date >= '2018-01-01'
+    GROUP BY
+      submission_date,
+      client_id
+    HAVING
+      COUNT(*) > 150000
+  ),
+  windowed_firefox_desktop AS (
+    SELECT
+      submission_date,
+      client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      `moz-fx-data-shared-prod.udf.mode_last`(ARRAY_AGG(city) OVER w1) AS city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision1) OVER w1
+      ) AS geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(geo_subdivision2) OVER w1
+      ) AS geo_subdivision2,
+    FROM
+      with_date_offsets_firefox_desktop
+    LEFT JOIN
+      overactive_firefox_desktop
+      USING (submission_date, client_id)
+    WHERE
+      overactive_firefox_desktop.client_id IS NULL
+      AND submission_date >= '2018-01-01'
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_id,
+          submission_date
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  clients_daily_firefox_desktop AS (
+    SELECT
+      cd.* EXCEPT (_n),
+    FROM
+      windowed_firefox_desktop AS cd
+    WHERE
+      _n = 1
+  ),
+  clients_city_first_seen_firefox_desktop AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS first_seen_geo_date,
+      city AS first_seen_geo_city,
+      geo_subdivision1 AS first_seen_geo_subdivision1,
+      geo_subdivision2 AS first_seen_geo_subdivision2,
+    FROM
+      clients_daily_firefox_desktop
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date) = 1
+  ),
+  clients_city_last_seen_firefox_desktop AS (
+    SELECT
+      client_id,
+      sample_id,
+      submission_date AS last_seen_geo_date,
+      city AS last_seen_geo_city,
+      geo_subdivision1 AS last_seen_geo_subdivision1,
+      geo_subdivision2 AS last_seen_geo_subdivision2,
+    FROM
+      clients_daily_firefox_desktop
+    QUALIFY
+      ROW_NUMBER() OVER (PARTITION BY client_id, sample_id ORDER BY submission_date DESC) = 1
+  )
+  SELECT
+    "firefox_desktop" AS app_id,
+    COALESCE(cfs.client_id, cls.client_id) AS client_id,
+    COALESCE(cfs.sample_id, cls.sample_id) AS sample_id,
+    first_seen_geo_date,
+    first_seen_geo_city,
+    first_seen_geo_subdivision1,
+    first_seen_geo_subdivision2,
+    last_seen_geo_date,
+    last_seen_geo_city,
+    last_seen_geo_subdivision1,
+    last_seen_geo_subdivision2,
+  FROM
+    clients_city_first_seen_firefox_desktop cfs
+  FULL OUTER JOIN
+    clients_city_last_seen_firefox_desktop cls
+    ON cfs.client_id = cls.client_id
+    AND cfs.sample_id = cls.sample_id
+{% else %}
+  WITH _previous_firefox_desktop AS (
+    SELECT
+      *
+    FROM
+      `moz-fx-data-shared-prod.firefox_desktop_derived.baseline_clients_city_seen_v1`
+    WHERE
+      app_id = "firefox_desktop"
+  ),
+  _current_windowed_firefox_desktop AS (
+    SELECT
+      "firefox_desktop" AS app_id,
+      client_info.client_id AS client_id,
+      sample_id,
+      ROW_NUMBER() OVER w1_unframed AS _n,
+      @submission_date AS first_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS first_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS first_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS first_seen_geo_subdivision2,
+      @submission_date AS last_seen_geo_date,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.city) OVER w1
+      ) AS last_seen_geo_city,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision1) OVER w1
+      ) AS last_seen_geo_subdivision1,
+      `moz-fx-data-shared-prod.udf.mode_last`(
+        ARRAY_AGG(metadata.geo.subdivision2) OVER w1
+      ) AS last_seen_geo_subdivision2,
+    FROM
+      `moz-fx-data-shared-prod.firefox_desktop_live.baseline_v1`
+    WHERE
+      DATE(submission_timestamp) = @submission_date
+      AND sample_id = 0
+    WINDOW
+      w1 AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+        ROWS BETWEEN
+          UNBOUNDED PRECEDING
+          AND UNBOUNDED FOLLOWING
+      ),
+      w1_unframed AS (
+        PARTITION BY
+          sample_id,
+          client_info.client_id,
+          DATE(submission_timestamp)
+        ORDER BY
+          submission_timestamp
+      )
+  ),
+  _current_firefox_desktop AS (
+    SELECT
+      cw.* EXCEPT (_n),
+    FROM
+      _current_windowed_firefox_desktop AS cw
+    WHERE
+      _n = 1
+  )
+  SELECT
+    app_id,
+    client_id,
+    sample_id,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_date, _p.first_seen_geo_date) AS first_seen_geo_date,
+    IF(_p.client_id IS NULL, _c.first_seen_geo_city, _p.first_seen_geo_city) AS first_seen_geo_city,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision1,
+      _p.first_seen_geo_subdivision1
+    ) AS first_seen_geo_subdivision1,
+    IF(
+      _p.client_id IS NULL,
+      _c.first_seen_geo_subdivision2,
+      _p.first_seen_geo_subdivision2
+    ) AS first_seen_geo_subdivision2,
+    IF(
+      _p.last_seen_geo_date < _c.last_seen_geo_date,
+      _c.last_seen_geo_date,
+      _p.last_seen_geo_date
+    ) AS last_seen_geo

⚠️ Only part of the diff is displayed.

Link to full diff

@wwyc wwyc requested a review from soGaussian September 2, 2025 17:48
@@ -0,0 +1,230 @@
-- Query generated via sql_generators.clients_city_seen.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: update to sql_generators.baseline_clients_city_seen.

friendly_name: Baseline Clients City Seen
description: |-
This table stores the first-seen and last-seen geo attributes for each client_id.
The table was initialized from stable tables (with ~2 years of retention), so the initial dates reflect the earliest/latest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add note where initialization code would no longer be relevant once city and subdivision fields in stable tables are nullified.

Copy link
Contributor

@BenWu BenWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove all the generated files in sql/ before merging because generated sql doesn't need to be checked in.

The is_init query with a 100% sample might be too expensive run in a single query and timeout. This is fine for the POC but something to prepare for later

WHERE
client_info.client_id IS NOT NULL
AND sample_id = 0
AND DATE(submission_timestamp) <= "2025-08-25"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current date is probably more appropriate for the init so it gets everything

Suggested change
AND DATE(submission_timestamp) <= "2025-08-25"
AND DATE(submission_timestamp) <= CURRENT_DATE()

Comment on lines +25 to +28
depends_on:
- task_id: copy_deduplicate_all
dag_name: copy_deduplicate
execution_delta: 1h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed because copy_deduplicate is to populate the stable tables and this uses live. Also execution_delta would be 3h in this case

Copy link
Contributor Author

@wwyc wwyc Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

You should remove all the generated files in sql/ before merging
because generated sql doesn't need to be checked in.

I plan to remove all the generated files in sql/ before merging. I've added them here just as reference/sharing with DS for now.

The is_init query with a 100% sample might be too expensive run in > a single query and timeout. This is fine for the POC but something > to prepare for later

How would you recommend to run this to ensure it would not time out?

This isn't needed because copy_deduplicate is to populate the
stable tables and this uses live. Also execution_delta would be 3h
in this case

can you explain more on why execeution_delta would be 3h? Which DAG should this depend on if not copy_deduplicate to ensure the data is available in the live tables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • One way to initialize the tables would be per sample id. You could write a script that just runs the query per sample id
  • copy_deduplicate is scheduled at 1am and bqetl_clients_city_seen is scheduled at 4am so the execution delta is 3 hours
  • There's no dag for the live tables so this wouldn't depend on anything. It would just assume data is complete after some time past midnight UTC. This is what copy_deduplicate does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if I add @sample_id then the table would be initialized in parallel per sample_id:

To run in parallel per sample_id, include a @sample_id parameter in the query.

@@ -2490,3 +2490,20 @@ bqetl_bigeye_derived:
retry_delay: 30m
tags:
- impact/tier_3

bqetl_clients_city_seen:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: change to bqetl_baseline_clients_city_seen

table_type: client_level
dag: bqetl_clients_city_seen
scheduling:
dag_name: bqetl_clients_city_seen
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: change to bqetl_baseline_clients_city_seen

@@ -573,3 +573,5 @@ retention_exclusion_list:
- sql/moz-fx-data-shared-prod/firefox_ios_derived/client_adclicks_history_v1
- sql/moz-fx-data-shared-prod/acoustic_external/suppression_list_v1
- sql/moz-fx-data-shared-prod/telemetry_derived/fx_accounts_linked_clients_v1
- sql/moz-fx-data-shared-prod/firefox_desktop_derived/clients_city_seen_v1
- sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: change to baseline_clients_city_seen_v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants