Skip to content

Add script to delete unused tokens from TYR database#4518

Closed
devin-ai-integration[bot] wants to merge 2 commits intodevfrom
devin/1771931306-delete-unused-tokens
Closed

Add script to delete unused tokens from TYR database#4518
devin-ai-integration[bot] wants to merge 2 commits intodevfrom
devin/1771931306-delete-unused-tokens

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 24, 2026

Add script to delete unused tokens from TYR database

Summary

Adds a Python script (source/tyr/delete_unused_tokens.py) that reads a CSV export of unused tokens (no API calls for 365+ days) and generates SQL DELETE statements for the key table.

How it works:

  • Parses the semicolon-delimited CSV, extracting the "Début du token" (token prefix) column
  • For 1755 entries with valid 8-char hex prefixes: generates DELETE FROM key WHERE token LIKE '<prefix>%' using a temporary table for efficient matching
  • For 14 entries where Excel corrupted the prefix (scientific notation like 8,85E+08, or truncated to 7 digits): falls back to DELETE FROM key WHERE id IN (...) using the tyr_id column
  • Generated SQL is wrapped in BEGIN; with COMMIT commented out so the operator can review row counts before committing
  • Also supports --execute mode for direct DB execution via SQLAlchemy

Review & Testing Checklist for Human

  • Verify that tyr_id in the CSV corresponds to key.id — this assumption was inferred (same login appears with different tyr_id values), but has not been confirmed against the actual production database. If tyr_id is actually user.id, the 14 fallback deletions by ID would be wrong.
  • Review the 14 corrupted entries (lines 48–61 of the generated SQL) — these will be deleted by ID rather than token prefix. Confirm these are the correct keys to remove.
  • Run the generated SQL with SELECT COUNT(*) first (already included in the output) to verify the number of matched tokens before committing the DELETE. Check that the count matches expectations (~1769).
  • Consider LIKE prefix collision risk — deletions match token LIKE '<8-char-hex>%'. If two different tokens share the same first 8 characters, both would be deleted. This is statistically unlikely but worth a quick sanity check.
  • Note on --execute mode: it calls conn.commit() at the end, which will auto-commit the transaction even though COMMIT is commented out in the SQL. If you plan to use --execute, be aware there is no review step.

Notes


Open with Devin

Python script that reads a CSV file of unused tokens (no API calls for 365+ days)
and generates SQL DELETE statements for the 'key' table.

- Handles 1755 valid hex token prefixes via LIKE matching
- Handles 14 corrupted entries (Excel scientific notation) via key ID
- Uses a transaction with manual COMMIT for safety
- Supports both SQL file generation and direct DB execution

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

Original prompt from louis.pautasso
A partir de ce fichier, fais moi un script qui supprime en bdd tout les tokens qui commencent pas la liste dans la colonne "debut de token"
ATTACHMENT:"https://app.devin.ai/attachments/450b708c-acd9-46c1-8e05-8ea10a4aea1d/tokens-unsage-365%28result%29+%281%29.csv"

You only need to look in the following repos: hove-io/loki, hove-io/datahub-api-aws-infra, hove-io/navitia

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +145 to +147
for statement in sql.split(";"):
statement = statement.strip()
if statement and not statement.startswith("--"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Naive SQL splitting on ; causes BEGIN, CREATE TABLE, and DELETE statements to be skipped in --execute mode

In --execute mode, execute_sql splits the entire generated SQL on ; and then skips any chunk that starts with --. Because generate_sql places comment lines immediately before SQL statements (with no intervening ;), the split merges comments with the following SQL statement into a single chunk. Since the chunk starts with --, the actual SQL statement is silently skipped.

Detailed explanation of which statements are skipped and why

The generated SQL looks like:

-- =============================================================
-- Script de suppression...
-- =============================================================

BEGIN;

-- Partie 1: Suppression par préfixe...

CREATE TEMPORARY TABLE _token_prefixes_to_delete ...;

When split on ;, the first chunk is:

-- =============================================================\n...\n\nBEGIN

This starts with --, so BEGIN is never executed — all operations run without a transaction.

The second chunk is:

\n\n-- Partie 1: ...\n\nCREATE TEMPORARY TABLE _token_prefixes_to_delete (prefix TEXT NOT NULL)

This also starts with -- (after stripping), so CREATE TEMPORARY TABLE is never executed. The subsequent INSERT and DELETE statements referencing _token_prefixes_to_delete will then fail with a "relation does not exist" error.

Similarly, the DELETE FROM key WHERE id IN (...) for corrupted entries is in a chunk starting with comments and is also skipped.

Impact: In --execute mode, the script either crashes (table not found) or silently skips critical DELETE statements, and runs without transaction safety.

Prompt for agents
In source/tyr/delete_unused_tokens.py, the execute_sql function (lines 134-154) splits SQL on semicolons and then checks if each chunk starts with '--' to skip comments. This is fundamentally broken because comments and SQL statements get merged into the same chunk after splitting.

The fix should replace the naive split-on-semicolon approach with proper statement-by-statement execution. Options:

1. Instead of generating a single SQL string and splitting it, refactor generate_sql to return a list of individual SQL statements (without comments), and have execute_sql iterate over that list directly.

2. Alternatively, use sqlalchemy's text() to execute the entire SQL script at once if the driver supports it, or use a proper SQL parser.

3. At minimum, filter out comment-only lines from each chunk before checking if it starts with '--'. For example, after splitting on ';', strip each chunk, split it into lines, remove lines that start with '--' or are empty, and then rejoin to get the actual SQL statement.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +113 to +114
key_ids = ", ".join(entry[0] for entry in corrupted_entries)
lines.append(f"DELETE FROM key WHERE id IN ({key_ids});")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 SQL injection via unsanitized tyr_id from CSV in corrupted entries DELETE statement

The tyr_id values read from the CSV are directly interpolated into a SQL DELETE FROM key WHERE id IN (...) statement at source/tyr/delete_unused_tokens.py:114 without any validation that they are integers. A malicious or malformed CSV could contain arbitrary SQL in the tyr_id column.

Root cause and exploitation path

At source/tyr/delete_unused_tokens.py:43, tyr_id = row[0].strip() reads the raw string from CSV. At line 113-114:

key_ids = ", ".join(entry[0] for entry in corrupted_entries)
lines.append(f"DELETE FROM key WHERE id IN ({key_ids});")

If a CSV row has tyr_id = 1); DROP TABLE key; --, the generated SQL becomes:

DELETE FROM key WHERE id IN (1); DROP TABLE key; --);

This is exploitable both in the generated SQL file (if executed by a DBA) and in --execute mode. Even for the --output mode (generating a .sql file), the injected SQL would be present in the output file.

Impact: Potential for arbitrary SQL execution including data destruction.

Suggested change
key_ids = ", ".join(entry[0] for entry in corrupted_entries)
lines.append(f"DELETE FROM key WHERE id IN ({key_ids});")
key_ids = ", ".join(str(int(entry[0])) for entry in corrupted_entries)
lines.append(f"DELETE FROM key WHERE id IN ({key_ids});")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration
Copy link
Contributor Author

Closing due to inactivity for more than 7 days. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant