-
Notifications
You must be signed in to change notification settings - Fork 964
Synchronize Database models with migrations #3435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8192aeb to
a2b479d
Compare
|
OK so it seems like the smoke tests revealed an interesting bug Stack Trace``` augur-1 | Traceback (most recent call last): augur-1 | File "/augur/.venv/bin/augur", line 10, in augur-1 | sys.exit(run()) augur-1 | ^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 1462, in __call__ augur-1 | return self.main(*args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 1383, in main augur-1 | rv = self.invoke(ctx) augur-1 | ^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 1850, in invoke augur-1 | return _process_result(sub_ctx.command.invoke(sub_ctx)) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 1850, in invoke augur-1 | return _process_result(sub_ctx.command.invoke(sub_ctx)) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 1246, in invoke augur-1 | return ctx.invoke(self.callback, **ctx.params) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 814, in invoke augur-1 | return callback(*args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/decorators.py", line 34, in new_func augur-1 | return f(get_current_context(), *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/augur/application/cli/__init__.py", line 45, in new_func augur-1 | return ctx.invoke(function_internet_connection, *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 814, in invoke augur-1 | return callback(*args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/decorators.py", line 34, in new_func augur-1 | return f(get_current_context(), *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/augur/application/cli/__init__.py", line 57, in new_func augur-1 | return ctx.invoke(function_db_connection, *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 814, in invoke augur-1 | return callback(*args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/decorators.py", line 34, in new_func augur-1 | return f(get_current_context(), *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/augur/application/cli/__init__.py", line 105, in new_func augur-1 | return ctx.invoke(f, *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/core.py", line 814, in invoke augur-1 | return callback(*args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/click/decorators.py", line 34, in new_func augur-1 | return f(get_current_context(), *args, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/augur/application/cli/config.py", line 105, in init_config augur-1 | config.load_config_from_dict(augmented_config) augur-1 | File "/augur/augur/application/config.py", line 316, in load_config_from_dict augur-1 | self.add_section_from_json(section_name=section_name, json_data=value) augur-1 | File "/augur/augur/application/config.py", line 282, in add_section_from_json augur-1 | writeable_config.create_section(section_name, json_data, ignore_existing=True) augur-1 | File "/augur/augur/application/config.py", line 708, in create_section augur-1 | self.add_value(section_name, key, value, ignore_existing=ignore_existing) augur-1 | File "/augur/augur/application/config.py", line 741, in add_value augur-1 | if not self.has_value(section_name, value_key): augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/augur/application/config.py", line 731, in has_value augur-1 | query = self.session.query(Config).filter(and_(Config.section_name == section_name,Config.setting_name == value_key) ) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2910, in query augur-1 | return self._query_cls(entities, self, **kwargs) augur-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 276, in __init__ augur-1 | self._set_entities(entities) augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 288, in _set_entities augur-1 | self._raw_columns = [ augur-1 | ^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 289, in augur-1 | coercions.expect( augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/sql/coercions.py", line 406, in expect augur-1 | insp._post_inspect augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 1253, in __get__ augur-1 | obj.__dict__[self.__name__] = result = self.fget(obj) augur-1 | ^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2707, in _post_inspect augur-1 | self._check_configure() augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2386, in _check_configure augur-1 | _configure_registries({self.registry}, cascade=True) augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4199, in _configure_registries augur-1 | _do_configure_registries(registries, cascade) augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4240, in _do_configure_registries augur-1 | mapper._post_configure_properties() augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2403, in _post_configure_properties augur-1 | prop.init() augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/interfaces.py", line 579, in init augur-1 | self.do_init() augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1637, in do_init augur-1 | self._setup_join_conditions() augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1882, in _setup_join_conditions augur-1 | self._join_condition = jc = JoinCondition( augur-1 | ^^^^^^^^^^^^^^ augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2315, in __init__ augur-1 | self._check_foreign_cols(self.primaryjoin, True) augur-1 | File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2958, in _check_foreign_cols augur-1 | raise sa_exc.ArgumentError(err) augur-1 | sqlalchemy.exc.ArgumentError: Could not locate any relevant foreign key columns for primary join condition 'augur_data.commits.cmt_author_platform_username = augur_data.contributors.cntrb_login' on relationship Contributor.commits. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or are annotated in the join condition with the foreign() annotation. ```Seems like the removal of the foreign key from the DB model is causing issues with the SQLAlchemy model given that the FK's removed from the model (which are not present in any DB i can see as far as I can tell) were in fact being relied on by other parts of the model, specifically the |
9e70fbe to
25ffb5d
Compare
augur/application/schema/alembic/versions/38_sync_model_with_migrations.py
Outdated
Show resolved
Hide resolved
96498bd to
dabf098
Compare
dabf098 to
d0ea92b
Compare
6e496a2 to
d17e0d2
Compare
|
|
||
| cms_id = Column( | ||
| BigInteger, | ||
| Sequence('chaoss_metric_status_cms_id_seq', start=1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs schema='augur_data' - sequences don't inherit the table's schema automatically. Same applies to all other Sequence definitions in this file. See: https://docs.sqlalchemy.org/en/20/core/defaults.html#defining-sequences
Sequence('chaoss_metric_status_cms_id_seq', start=1, schema='augur_data')|
Hey @MoralCode, thanks for tackling this schema drift cleanup. Found one issue that will likely cause problems:
The old Fix: Add # Instead of:
Sequence('chaoss_metric_status_cms_id_seq', start=1)
# Use:
Sequence('chaoss_metric_status_cms_id_seq', start=1, schema='augur_data')This affects all sequence definitions in the PR.
Everything else looks good. |
Does alembic have a mechanism to run pre-migration checks like this? (or even just a mechanism to exit early from a migration with an error if the precondition isnt met?) Would be nice to be able to have migrations be either applied or not applied, rather than potentially partially applied |
|
Yeah, you can just run validation queries at the start of def upgrade():
conn = op.get_bind()
result = conn.execute(text("""
SELECT gh_login, COUNT(*) as cnt FROM augur_data.contributors
WHERE gh_login IS NOT NULL
GROUP BY gh_login HAVING COUNT(*) > 1
LIMIT 5
"""))
dupes = result.fetchall()
if dupes:
raise Exception(f"Cannot apply migration: found duplicate gh_login values: {dupes}")
result = conn.execute(text("""
SELECT cntrb_login, COUNT(*) as cnt FROM augur_data.contributors
WHERE cntrb_login IS NOT NULL
GROUP BY cntrb_login HAVING COUNT(*) > 1
LIMIT 5
"""))
dupes = result.fetchall()
if dupes:
raise Exception(f"Cannot apply migration: found duplicate cntrb_login values: {dupes}")
# Actual migration operations...The The exception aborts the transaction before any schema changes are committed. |
d17e0d2 to
897343f
Compare
|
Added schemas to sequences. Also migrated the operations and spdx schemas to use these newer sequence calls too. gave the duplication check queries a run on our instance and.... we have about 2300 duplicates for probably historical instance reasons. |
|
@shlokgilda problem: looking into the gh-login table duplication in our instance, im noticing that, at least for this random user i picked, it seems to actually be two separate users (one github, one gitlab) with the same username. It seems like its not possible for these colums to have a unique constraint applied to them. also: im seeing lots of gh_* prefixed columns being used for gitlab data too... |
|
Ah, that's a data model issue then, not dirty data. If the same username can legitimately exist on both GitHub and GitLab as different people, then Options:
UniqueConstraint('gh_login', 'data_source', name='GH-UNIQUE-C', ...)
# Though this assumes data_source reliably distinguishes GitHub vs GitLab contributors.
Re: GitLab data in Worth checking: does the code actually rely on |
already documented in #3469
Ah yeah thats pretty decent evidence that I resolved the conflict in this PR potentially the wrong way. Will update |
|
Another problem: the I dont see an easy way to remove this FK cleanly, but we clearly cannot keep it because it depends on the unique constraint above that was removed Edit: looks like this FK was also dropped in migration 22 The underlying type that exists in-database is a nullable string so thats what im using instead |
|
ok next issue: foreign key loop in the SPDX schema between package.package_id and package_files seems like that may be just a warning though. seems interesting that alembic is generating an empty migration rather than saying the database is up to date... |
897343f to
902b6d7
Compare
Signed-off-by: Adrian Edwards <[email protected]>
…ng them, or adding them Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
902b6d7 to
a60ddea
Compare
|
ok so that field needs to be an FK apparrently Stack Trace |
|
bumping to next release - i really want to get this right so we can start doing better database migrations |
Description
This PR resolves and reconciles the differences between the current, in-database data schema and the schema defined by the SQLAlchemy models. This is needed because, for the past few years, both the models and the migrations have had to be made manually, meaning a lack of perfect consistency between updates to the two sources can cause drift between them.
The recent changes in #3416 configured alembic to automatically compare the data models and the deployed schema and generate migrations that represent the differences. This PR reconciles the various differences that have accumulated over the years.
This PR is one step towards, but not yet a complete fix for #3388.
Notes for Reviewers
Most of the schema conflicts were resolved with relatively minor adjustments to the data models (such as specifying names for Foreign keys, adding fields to unique constraints, etc) to bring them in line with what actual deployed databases have.
There were a few places where I thought it was better to update the database instead:
contributors.gh_loginthat I dropped, since I believe gh_login has been superseded by another method of looking up contributors for a while nowpull_request_reviewstable was updated to addtool_sourceto its uniqueness check.augur_operations.users.email_verified column was droppedsince a recent maintainers call confirmed that email verification was a feature that was previously being explored as part of a password reset feature, but was ultimately dropped due to having limited utility (wasnt worth the the additional overhead of configuring sendgrid for email processing compared to a hypothetical future implementation that uses the Augur CLI to enable admins to perform password resets)Please look over all the changes and let me know if you think any of these changes needs to be reconciled in a different way (i.e. a model change is incorrect and the database needs to change instead)
I have yet to run this on an actual augur install but plan to soon.
This change is a pretty severe blocker for other database changes I would like to make since I want to start from a relatively clean/consistent state when proposing additional migrations (including some that are important for the next release)
Signed commits
Gen AI Disclosure
Generative AI was only used for the planning stages of this changes - for example, looking at the schemas or alembic-generated migrations and helping summarize what the differences were between the models and the database. All of these changes were understood/verified and implemented by me.