Skip to content

feat: Add data for enrollment alerts#9003

Merged
yashikakhurana merged 15 commits intomainfrom
enrollment_alert_data
Mar 20, 2026
Merged

feat: Add data for enrollment alerts#9003
yashikakhurana merged 15 commits intomainfrom
enrollment_alert_data

Conversation

@yashikakhurana
Copy link
Copy Markdown
Contributor

@yashikakhurana yashikakhurana commented Mar 13, 2026

Description

This PR adds a new daily data export for the Experimenter alerting system. It creates the experiment_enrollment_alert_data_v1 table in the moz-fx-data-experiments.monitoring dataset to export per-experiment enrollment/unenrollment counts by branch and unenrollment reason breakdown.

The data is exported as JSON to GCS (gs://mozanalysis/enrollment_alerts/) for the Experimenter alerting system to consume. Raw counts only; Experimenter handles threshold computations.

Example:

 {
  "v1": {
    "experiment_slug": {
      "total_enrollments": 1000,
      "total_unenrollments": 50,
      "branches": {
        "control": {
          "enrollments": 500,
          "unenrollments": 25
        },
        "treatment": {
          "enrollments": 500,
          "unenrollments": 25
        }
      },
      "unenrollment_reasons": {
        "not_interested": 15,
        "technical_issue": 8,
        "other": 27
      }
    },
    "another_experiment": {
      "total_enrollments": 2000,
      "total_unenrollments": 100,
      "branches": {...},
      "unenrollment_reasons": {...}
    }
  }
}

Related Tickets & Documents

Reviewer, please follow this checklist

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@mikewilli mikewilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems like the right shape, but the unenrollment reasons query doesn't seem to work, and I left a couple other smaller suggestions.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@yashikakhurana
Copy link
Copy Markdown
Contributor Author

Updated

{
  "v1": {
    "test-experiment-001": {
      "total_enrollments": 15000,
      "total_unenrollments": 1200,
      "branches": {
        "control": {
          "enrollments": 7500,
          "unenrollments": 580
        },
        "treatment-a": {
          "enrollments": 7500,
          "unenrollments": 620
        }
      },
      "unenrollment_reasons": {},
      "reasons_by_branch": {
        "control": {
          "targeting-mismatch": {
            "1pct_count": 85
          },
          "studies-opt-out": {
            "1pct_count": 42
          },
          "unknown": {
            "1pct_count": 15
          }
        },
        "treatment-a": {
          "studies-opt-out": {
            "1pct_count": 98
          },
          "targeting-mismatch": {
            "1pct_count": 72
          },
          "user-request": {
            "1pct_count": 28
          }
        }
      }
    },
    "onboarding-experiment-v2": {
      "total_enrollments": 25000,
      "total_unenrollments": 3500,
      "branches": {
        "control": {
          "enrollments": 12500,
          "unenrollments": 1750
        },
        "treatment": {
          "enrollments": 12500,
          "unenrollments": 1750
        }
      },
      "unenrollment_reasons": {},
      "reasons_by_branch": {
        "control": {
          "targeting-mismatch": {
            "1pct_count": 245
          },
          "studies-opt-out": {
            "1pct_count": 180
          },
          "user-request": {
            "1pct_count": 92
          }
        },
        "treatment": {
          "studies-opt-out": {
            "1pct_count": 210
          },
          "targeting-mismatch": {
            "1pct_count": 265
          },
          "user-request": {
            "1pct_count": 85
          }
        }
      }
    }
  }
}

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@mikewilli mikewilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments but looks good. I think it makes sense to get this out so downstream work is unblocked, and we can always iterate if needed.

# Upload to GCS
storage_client = storage.Client(args.project)
bucket = storage_client.bucket("mozanalysis")
json_str = json.dumps(versioned_data, indent=2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to format this with indentations? I assume this is meant to be read by a machine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, just my brain always in the habit to indent

Comment on lines +159 to +161
bucket.blob(latest_path).upload_from_string(
json_str, content_type="application/json"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: I think you can copy instead of uploading again, which I'd expect to be more efficient:

bucket.copy_blob(dated_path, bucket, latest_path)

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

Integration report for "feat: Add data for enrollment alerts"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_experiments_daily.py /tmp/workspace/generated-sql/dags/bqetl_experiments_daily.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_experiments_daily.py	2026-03-20 16:01:52.814265115 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_experiments_daily.py	2026-03-20 16:01:49.107253084 +0000
@@ -399,6 +399,22 @@
         depends_on_past=False,
     )
 
+    monitoring__experiment_enrollment_alert_data__v1 = GKEPodOperator(
+        task_id="monitoring__experiment_enrollment_alert_data__v1",
+        arguments=[
+            "python",
+            "sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/query.py",
+        ]
+        + [],
+        image="us-docker.pkg.dev/moz-fx-data-artifacts-prod/bigquery-etl/bigquery-etl:latest",
+        owner="ykhurana@mozilla.com",
+        email=[
+            "ascholtz@mozilla.com",
+            "telemetry-alerts@mozilla.com",
+            "ykhurana@mozilla.com",
+        ],
+    )
+
     monitoring__query_cost__v1 = bigquery_etl_query(
         task_id="monitoring__query_cost__v1",
         destination_table="query_cost_v1",
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-experiments/monitoring: experiment_enrollment_alert_data_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/metadata.yaml	2026-03-20 16:01:49.747255142 +0000
@@ -0,0 +1,12 @@
+---
+friendly_name: Experiment Enrollment Alert Data
+description: |-
+  Exports per-experiment enrollment/unenrollment counts by branch and unenrollment reasons to GCS
+  for use by Experimenter alerting system. Raw data only; Experimenter handles threshold computations.
+  Does not create any BigQuery artifacts.
+owners:
+  - ykhurana@mozilla.com
+labels:
+  incremental: false
+scheduling:
+  dag_name: bqetl_experiments_daily
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/query.py /tmp/workspace/generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/query.py
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/query.py	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-experiments/monitoring/experiment_enrollment_alert_data_v1/query.py	2026-03-20 16:01:49.780255248 +0000
@@ -0,0 +1,163 @@
+#!/usr/bin/env python3
+
+"""Export enrollment/unenrollment counts and reasons to GCS for Experimenter alerting.
+
+Queries cumulative enrollment/unenrollment counts by experiment and branch,
+plus unenrollment reason breakdown. Exports as JSON to GCS for Experimenter
+alerting system to consume. Raw counts only; Experimenter handles computations.
+"""
+
+import json
+from argparse import ArgumentParser
+
+from google.cloud import bigquery, storage
+
+# BigQuery queries to extract enrollment/unenrollment data
+ENROLLMENT_QUERY = """
+WITH active_experiments AS (
+  SELECT DISTINCT
+    normandy_slug as experiment
+  FROM `moz-fx-data-experiments.monitoring.experimenter_experiments_v1`
+  WHERE start_date IS NOT NULL
+),
+enrollment_totals AS (
+  SELECT
+    experiment,
+    branch,
+    SUM(value) as total_enrollments
+  FROM `moz-fx-data-shared-prod.telemetry_derived.experiment_enrollment_overall_v1`
+  WHERE experiment IS NOT NULL AND branch IS NOT NULL
+  GROUP BY 1, 2
+),
+unenrollment_totals AS (
+  SELECT
+    experiment,
+    branch,
+    SUM(value) as total_unenrollments
+  FROM `moz-fx-data-shared-prod.telemetry_derived.experiment_unenrollment_overall_v1`
+  WHERE experiment IS NOT NULL AND branch IS NOT NULL
+  GROUP BY 1, 2
+),
+combined_by_experiment_branch AS (
+  SELECT
+    COALESCE(e.experiment, u.experiment) as experiment,
+    COALESCE(e.branch, u.branch) as branch,
+    COALESCE(e.total_enrollments, 0) as enrollments,
+    COALESCE(u.total_unenrollments, 0) as unenrollments
+  FROM enrollment_totals e
+  FULL OUTER JOIN unenrollment_totals u
+    ON e.experiment = u.experiment AND e.branch = u.branch
+)
+SELECT
+  experiment,
+  branch,
+  enrollments,
+  unenrollments,
+  SUM(enrollments) OVER (PARTITION BY experiment) as experiment_total_enrollments,
+  SUM(unenrollments) OVER (PARTITION BY experiment) as experiment_total_unenrollments
+FROM combined_by_experiment_branch
+WHERE experiment IN (SELECT experiment FROM active_experiments)
+ORDER BY 1, 2
+"""
+
+UNENROLLMENT_REASONS_QUERY = """
+-- Query unenrollment reasons from the 1% sample (sample_id = 0)
+-- This is sufficient for determining relative ranking of reasons for alerts.
+-- Alerts will link to Looker dashboard for detailed analysis of actual counts.
+WITH active_experiments AS (
+  SELECT DISTINCT
+    normandy_slug as experiment
+  FROM `moz-fx-data-experiments.monitoring.experimenter_experiments_v1`
+  WHERE start_date IS NOT NULL
+)
+SELECT
+  active_experiments.experiment,
+  mozfun.map.get_key(events.event_map_values, 'branch') as branch,
+  mozfun.map.get_key(events.event_map_values, 'reason') as reason,
+  COUNT(*) as count
+FROM active_experiments
+LEFT JOIN `mozdata.telemetry.events` events
+  ON active_experiments.experiment = events.event_string_value
+  AND events.event_category = 'normandy'
+  AND events.event_method LIKE 'unenroll%'
+  AND events.sample_id = 0
+GROUP BY 1, 2, 3
+HAVING reason IS NOT NULL
+ORDER BY 1, 4 DESC
+"""
+
+parser = ArgumentParser(description=__doc__)
+parser.add_argument("--date", required=True, help="Execution date (YYYY-MM-DD)")
+parser.add_argument("--project", default="moz-fx-data-experiments")
+parser.add_argument(
+    "--gcs_folder",
+    default="enrollment_counts",
+    help="GCS folder name for storing exported data (default: enrollment_counts)",
+)
+
+
+def main():
+    """Export enrollment data to GCS for Experimenter alerting."""
+    args = parser.parse_args()
+
+    bq_client = bigquery.Client(args.project)
+
+    enrollment_rows = [dict(row) for row in bq_client.query(ENROLLMENT_QUERY).result()]
+    reason_rows = [
+        dict(row) for row in bq_client.query(UNENROLLMENT_REASONS_QUERY).result()
+    ]
+
+    # Aggregate into per-experiment structure
+    data = {}
+    for row in enrollment_rows:
+        exp = row["experiment"]
+        if exp not in data:
+            data[exp] = {
+                "total_enrollments": int(row["experiment_total_enrollments"]),
+                "total_unenrollments": int(row["experiment_total_unenrollments"]),
+                "branches": {},
+                "unenrollment_reasons": {},
+            }
+        branch = row["branch"]
+        data[exp]["branches"][branch] = {
+            "enrollments": int(row["enrollments"]),
+            "unenrollments": int(row["unenrollments"]),
+        }
+
+    # Add unenrollment reasons by branch
+    # Note: counts are from 1% sample (sample_id = 0) for performance
+    # These are used for relative ranking only; alerts link to Looker for detailed analysis
+    for row in reason_rows:
+        exp = row["experiment"]
+        if exp in data:
+            branch = row["branch"]
+            reason = row["reason"] or "unknown"
+            if "reasons_by_branch" not in data[exp]:
+                data[exp]["reasons_by_branch"] = {}
+            if branch not in data[exp]["reasons_by_branch"]:
+                data[exp]["reasons_by_branch"][branch] = {}
+            data[exp]["reasons_by_branch"][branch][reason] = {
+                "1pct_count": int(row["count"])
+            }
+
+    # Wrap in versioning schema
+    versioned_data = {"v1": data}
+
+    # Upload to GCS
+    storage_client = storage.Client(args.project)
+    bucket = storage_client.bucket("mozanalysis")
+    json_str = json.dumps(versioned_data)
+
+    # Dated version
+    dated_path = f"{args.gcs_folder}/enrollment_counts_{args.date}.json"
+    bucket.blob(dated_path).upload_from_string(
+        json_str, content_type="application/json"
+    )
+
+    # Latest version (copy dated version)
+    latest_path = f"{args.gcs_folder}/enrollment_counts_latest.json"
+    bucket.copy_blob(dated_path, bucket, latest_path)
+
+
+if __name__ == "__main__":
+    main()

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants