Add script to delete unused tokens from TYR database#4518
Add script to delete unused tokens from TYR database#4518devin-ai-integration[bot] wants to merge 2 commits intodevfrom
Conversation
Python script that reads a CSV file of unused tokens (no API calls for 365+ days) and generates SQL DELETE statements for the 'key' table. - Handles 1755 valid hex token prefixes via LIKE matching - Handles 14 corrupted entries (Excel scientific notation) via key ID - Uses a transaction with manual COMMIT for safety - Supports both SQL file generation and direct DB execution Co-Authored-By: unknown <>
Original prompt from louis.pautasso |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: unknown <>
| for statement in sql.split(";"): | ||
| statement = statement.strip() | ||
| if statement and not statement.startswith("--"): |
There was a problem hiding this comment.
🔴 Naive SQL splitting on ; causes BEGIN, CREATE TABLE, and DELETE statements to be skipped in --execute mode
In --execute mode, execute_sql splits the entire generated SQL on ; and then skips any chunk that starts with --. Because generate_sql places comment lines immediately before SQL statements (with no intervening ;), the split merges comments with the following SQL statement into a single chunk. Since the chunk starts with --, the actual SQL statement is silently skipped.
Detailed explanation of which statements are skipped and why
The generated SQL looks like:
-- =============================================================
-- Script de suppression...
-- =============================================================
BEGIN;
-- Partie 1: Suppression par préfixe...
CREATE TEMPORARY TABLE _token_prefixes_to_delete ...;When split on ;, the first chunk is:
-- =============================================================\n...\n\nBEGIN
This starts with --, so BEGIN is never executed — all operations run without a transaction.
The second chunk is:
\n\n-- Partie 1: ...\n\nCREATE TEMPORARY TABLE _token_prefixes_to_delete (prefix TEXT NOT NULL)
This also starts with -- (after stripping), so CREATE TEMPORARY TABLE is never executed. The subsequent INSERT and DELETE statements referencing _token_prefixes_to_delete will then fail with a "relation does not exist" error.
Similarly, the DELETE FROM key WHERE id IN (...) for corrupted entries is in a chunk starting with comments and is also skipped.
Impact: In --execute mode, the script either crashes (table not found) or silently skips critical DELETE statements, and runs without transaction safety.
Prompt for agents
In source/tyr/delete_unused_tokens.py, the execute_sql function (lines 134-154) splits SQL on semicolons and then checks if each chunk starts with '--' to skip comments. This is fundamentally broken because comments and SQL statements get merged into the same chunk after splitting.
The fix should replace the naive split-on-semicolon approach with proper statement-by-statement execution. Options:
1. Instead of generating a single SQL string and splitting it, refactor generate_sql to return a list of individual SQL statements (without comments), and have execute_sql iterate over that list directly.
2. Alternatively, use sqlalchemy's text() to execute the entire SQL script at once if the driver supports it, or use a proper SQL parser.
3. At minimum, filter out comment-only lines from each chunk before checking if it starts with '--'. For example, after splitting on ';', strip each chunk, split it into lines, remove lines that start with '--' or are empty, and then rejoin to get the actual SQL statement.
Was this helpful? React with 👍 or 👎 to provide feedback.
| key_ids = ", ".join(entry[0] for entry in corrupted_entries) | ||
| lines.append(f"DELETE FROM key WHERE id IN ({key_ids});") |
There was a problem hiding this comment.
🔴 SQL injection via unsanitized tyr_id from CSV in corrupted entries DELETE statement
The tyr_id values read from the CSV are directly interpolated into a SQL DELETE FROM key WHERE id IN (...) statement at source/tyr/delete_unused_tokens.py:114 without any validation that they are integers. A malicious or malformed CSV could contain arbitrary SQL in the tyr_id column.
Root cause and exploitation path
At source/tyr/delete_unused_tokens.py:43, tyr_id = row[0].strip() reads the raw string from CSV. At line 113-114:
key_ids = ", ".join(entry[0] for entry in corrupted_entries)
lines.append(f"DELETE FROM key WHERE id IN ({key_ids});")If a CSV row has tyr_id = 1); DROP TABLE key; --, the generated SQL becomes:
DELETE FROM key WHERE id IN (1); DROP TABLE key; --);This is exploitable both in the generated SQL file (if executed by a DBA) and in --execute mode. Even for the --output mode (generating a .sql file), the injected SQL would be present in the output file.
Impact: Potential for arbitrary SQL execution including data destruction.
| key_ids = ", ".join(entry[0] for entry in corrupted_entries) | |
| lines.append(f"DELETE FROM key WHERE id IN ({key_ids});") | |
| key_ids = ", ".join(str(int(entry[0])) for entry in corrupted_entries) | |
| lines.append(f"DELETE FROM key WHERE id IN ({key_ids});") |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Closing due to inactivity for more than 7 days. Configure here. |
Add script to delete unused tokens from TYR database
Summary
Adds a Python script (
source/tyr/delete_unused_tokens.py) that reads a CSV export of unused tokens (no API calls for 365+ days) and generates SQLDELETEstatements for thekeytable.How it works:
DELETE FROM key WHERE token LIKE '<prefix>%'using a temporary table for efficient matching8,85E+08, or truncated to 7 digits): falls back toDELETE FROM key WHERE id IN (...)using thetyr_idcolumnBEGIN;withCOMMITcommented out so the operator can review row counts before committing--executemode for direct DB execution via SQLAlchemyReview & Testing Checklist for Human
tyr_idin the CSV corresponds tokey.id— this assumption was inferred (same login appears with differenttyr_idvalues), but has not been confirmed against the actual production database. Iftyr_idis actuallyuser.id, the 14 fallback deletions by ID would be wrong.SELECT COUNT(*)first (already included in the output) to verify the number of matched tokens before committing theDELETE. Check that the count matches expectations (~1769).token LIKE '<8-char-hex>%'. If two different tokens share the same first 8 characters, both would be deleted. This is statistically unlikely but worth a quick sanity check.--executemode: it callsconn.commit()at the end, which will auto-commit the transaction even thoughCOMMITis commented out in the SQL. If you plan to use--execute, be aware there is no review step.Notes