Skip to content

SQL Integrity Error #131

@meteorologist15

Description

@meteorologist15

I have a pretty large (>4 GB) esgpull database (esgpull.db) that contains data pulled in from an old synda database long ago (using the synda db migration tools) as well as data/queries unique to esgpull itself.

When running a query for CMIP6 data, I reached the esgpull update step and then encountered a failure that I've never seen before.

Traceback (most recent call last):
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1936, in _exec_single_context
    self.dialect.do_executemany(
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 921, in do_executemany
    cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: file.file_id

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/esgpull/tui.py", line 180, in logging
    yield
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/esgpull/cli/update.py", line 184, in update
    with esg.db.commit_context():
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/esgpull/database.py", line 96, in commit_context
    self.session.commit()
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2017, in commit
    trans.commit(_to_root=True)
  File "<string>", line 2, in commit
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1302, in commit
    self._prepare_impl()
  File "<string>", line 2, in _prepare_impl
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
    ret_value = fn(self, *arg, **kw)
                ^^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1277, in _prepare_impl
    self.session.flush()
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4341, in flush
    self._flush(objects)
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4476, in _flush
    with util.safe_reraise():
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4437, in _flush
    flush_context.execute()
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1048, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1936, in _exec_single_context
    self.dialect.do_executemany(
  File "/path/to/miniconda3/envs/esgdownload/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 921, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: file.file_id

In my investigation, I looked in update.py, and discovered that the unregistered_files variable, a list of File objects set inside with esg.db.commit_context()contained a small number of File objects with a duplicated file_id field. Since the code sets the file_id field as UNIQUE for the database insertion, this triggers the above traceback regarding database integrity. The insertion step within the database session does not contain an SQL parameter to ignore duplicated entries, which is probably set by design, yet at the same time got the error in question to trigger. The tricky part is, the File objects themselves may indeed be unique, even though the same File.file_id was shared across the object in limited quantity.

I crudely hacked the script to remove the File objects in the list with the same File.file_id

raw_unregistered_files = [f for f in qf.files if f not in esg.db]
ruf_ids = []
dup_ids = []
for ruf in raw_unregistered_files:
    ruf_ids.append(ruf.file_id)
for ruf_id in ruf_ids:
    if ruf_ids.count(ruf_id) > 1:
        dup_ids.append(ruf_id)
unregistered_files = [f for f in raw_unregistered_files if f.file_id not in dup_ids]

though I'm sure there's a better way to do it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions