Skip to content

[CI] Use GraphQL API instead of BigQuery to get review data #525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 30, 2025

Conversation

jriv01
Copy link
Contributor

@jriv01 jriv01 commented Jul 28, 2025

As we are already making calls to the GitHub GraphQL API for data validation, we can just remove the added complexity of using GitHub Archive BigQuery as a data source and query the API directly. Using BigQuery has the advantage of not being rate-limited, but we often have to query for 50-70 commits via the API anyway due to missing records of events in GitHub Archive. With more than half of the BigQuery data points needing amending, it makes more sense to use the API as the original data source.

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 30, 2025

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable enough to me.

Can you also put up a PR to remove the service account from the terraform?

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 30, 2025

Can you also put up a PR to remove the service account from the terraform?

We'll still want the service account around, I have an upcoming PR that will export the processed data to our own BigQuery dataset so we'll still need some IAM infra. But I could remove the binding for the current BigQuery role at least, since they'll be unused permissions.

@boomanaiden154 boomanaiden154 merged commit 772b264 into llvm:main Jul 30, 2025
5 checks passed
boomanaiden154 pushed a commit that referenced this pull request Aug 5, 2025
…cs (#535)

This change reintroduces a BigQuery role binding that was removed in
#525. Now that our CronJob is also querying past data to determine the
number of unique LLVM contributors over time, we must grant the
associated service account `roles/bigquery.JobUser` so that the BigQuery
client can create query jobs.

This is the error without this binding:

```
google.api_core.exceptions.Forbidden: 403 POST: Access Denied: User does not have bigquery.jobs.create 
permission in project llvm-premerge-checks.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants