fix(error) commit table split into two(file and pull request) fixes issue#3682#3727
fix(error) commit table split into two(file and pull request) fixes issue#3682#3727yashisthebatman wants to merge 1 commit intochaoss:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the commits table to properly separate commit-level metadata from file-level statistics. Previously, the commits table stored one row per file per commit, mixing commit data with file-level data. This refactoring addresses Issue #3682 by splitting the table into two: a new commits table for commit-level data (one row per commit) and a commit_files table for file-level data (one row per file per commit).
Changes:
- Separated commits table into commits (commit-level) and commit_files (file-level) tables with proper foreign key relationships
- Updated database models with new CommitFile model and refactored Commit model
- Modified facade_bulk_insert_commits() to split records and upsert appropriately
- Updated 8 SQL cache-building queries in rebuildcache.py to LEFT JOIN commit_files
- Added GraphQL CommitFileType and files relationship
- Created Alembic migration (revision 39) with upgrade and downgrade paths
- Added 13 unit tests for model validation and helper functions
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| augur/application/schema/alembic/versions/39_split_commits_into_commits_and_commit_files.py | Alembic migration that renames commits to commit_files, creates new commits table, backfills data, and remaps foreign keys |
| augur/application/db/models/augur_data.py | Refactored Commit model to remove file-level columns, added new CommitFile model with foreign key to commits |
| augur/application/db/models/init.py | Exported CommitFile model for use throughout the application |
| augur/application/db/lib.py | Rewrote facade_bulk_insert_commits() to split records into commit and file-level data with separate upserts, added helper functions |
| augur/tasks/git/util/facade_worker/facade_worker/rebuildcache.py | Updated 8 cache-building SQL queries to LEFT JOIN commit_files for file-level aggregation |
| augur/api/server.py | Added CommitFileType GraphQL type and files field on CommitType |
| tests/test_classes/test_commit_file_model.py | Added comprehensive unit tests for model structure, relationships, constraints, and helper functions |
augur/application/schema/alembic/versions/39_split_commits_into_commits_and_commit_files.py
Outdated
Show resolved
Hide resolved
augur/application/schema/alembic/versions/39_split_commits_into_commits_and_commit_files.py
Outdated
Show resolved
Hide resolved
augur/application/schema/alembic/versions/39_split_commits_into_commits_and_commit_files.py
Outdated
Show resolved
Hide resolved
augur/application/schema/alembic/versions/39_split_commits_into_commits_and_commit_files.py
Show resolved
Hide resolved
2f29b5d to
376fc1c
Compare
Signed-off-by: yashisthebatman <yvchaudhary2005@gmail.com>
376fc1c to
091b767
Compare
|
@sgoggins Docker/Podman gives containers exactly 10 seconds to terminate gracefully on a docker-compose down. If they don't, it forcefully kills them with a SIGKILL, which registers as an error in Podman's CI tests. I looked at augur/application/service_manager.py which controls shutting down the main container (augur-1). The shutdown signal handler was written serially: It stopped Gunicorn and waited up to 5s. |
|
@yashisthebatman ideally finding a new problem like this would be reported as an issue so a solution can be discussed and planned and a PR can be linked to it. can you copy your comment there into a new issue? Can you also update the title of this PR to better describe what change it makes? |
|
@sgoggins alright i'll just post the issue but also passing the podman test for this issue would require me to generate a solution for that .Docker is getting checked and also i have given the solution to this issue in the services_manager.py so could you suggest me what to do ? |
|
if the timeouts are too low, we can solve that in the other issue, merge the fix and then this PR can be rebased onto that new main branch that incorporates the fix By the way, I am not Sean and you do not need to repeatedly ping maintainers to get your message seen, we already have our notifications on for this repo. |
|
alright sorry for that. I have created the issue |
|
Speaking of this PR: Splitting the commits table is a large, long term task that touches very core items in augur and needs to be done carefully and thoughtfully (with an eye towards how existing data in peoples existing databases will get migrated). Ideally by someone who is a long-time maintainer and has built up a lot of trust with the rest of the core team. This isnt the kind of task that is a great fit for a good first issue |
Description
Refactored the commits table to properly separate commit-level metadata from file-level statistics. The existing commits table stored one row per file per commit, mixing commit data (author, committer, hash, timestamps) with file-level data (filename, lines added/removed/whitespace). This caused data redundancy and inflated counts in metric queries.
Changes:
UniqueConstrainton (repo_id, cmt_commit_hash).commit_filestable) holding file-level columns (cmt_filename,cmt_added,cmt_removed,cmt_whitespace) with a foreign key tocommits.cmt_id.LEFT JOIN commit_filesfor file-level aggregation.SELECT DISTINCT ON, FK remapping forcommit_parentsandcommit_comment_ref, and includes a full downgrade path.This PR fixes #3682
Notes for Reviewers
DISTINCT ON (repo_id, cmt_commit_hash)to deduplicate.Signed commits