Open
Conversation
This reverts commit 2e71414.
…ix-duckdb_engine Made-with: Cursor
…hemas_dal
- extract_data.py: add missing `queries` variable to tuple unpack from create_dataframe (returned 6 values, expected 5)
- schemas_dal.py: add missing commas between all fields in WITH clause and collect({}) map literal in get_schema_columns query
Made-with: Cursor
…on and ExtractParams defaults - Rename EmbedParams.embedding_api_key -> api_key and update all call sites (batch.py, inprocess.py) - Add strict modality validation to EmbedParams (raise on invalid, replace silent image_text remap) - Add VALID_EMBED_MODALITIES constant; narrow IMAGE_MODALITIES to exclude image_text - Update ExtractParams defaults: method="pdfium", image_format="jpeg", jpeg_quality=100, render_mode="fit_to_model" - Raise TextChunkParams.max_tokens default from 512 to 1024 - Simplify executor.run_mode_ingest: remove embed_params/vdb_params args (callers set these on ingestor directly) - Remove unused metrics_parser.py - Add client dependency tweak (client/pyproject.toml) Made-with: Cursor
…dule Made-with: Cursor
…in ingestor.py Made-with: Cursor
… comment spacing) Made-with: Cursor
Renames all public API methods, params classes, internal attributes, constants, and helper functions that used 'structured' to 'tabular' to better reflect that the pipeline operates on relational/tabular data. Key changes: - Params: Structured*Params → Tabular*Params (and remove unused TabularPIIParams) - Ingestor methods: pull/store/populate/generate/get_*structured* → *tabular* - BatchIngestor: ingest_structured → ingest_tabular; _structured_* attrs → _tabular_* - Executor/runners: run_mode_ingest_structured → run_mode_ingest_tabular; run_batch_structured → run_batch_tabular (file renamed accordingly) - LanceDB table constant: _STRUCTURED_TABLE/"nv-ingest-structured" → _TABULAR_TABLE/"nv-ingest-tabular" - Helper: data_for_populate_structured → data_for_populate_tabular Made-with: Cursor
Made-with: Cursor
…nectors/ingestion/retrieval/neo4j Renames the top-level folder from relational_db to tabular_data and restructures its contents into four clear sub-packages: - connectors/ — DB connectors (DuckDB, Spider2) - ingestion/ — extract_data, population/graph, prepare_for_embedding - retrieval/ — generate_sql (merged from generate_sql/ facade + sql_tool/) - neo4j/ — Neo4j connection management (was neo4j_connection/) Updates all import paths across the codebase accordingly. Made-with: Cursor
…Cypher queries - Add Edges class to reserved_words.py with CONTAINS, CONNECTING, FOREIGN_KEY constants - Make RelTypes inherit from Edges for backward compatibility - Replace all hardcoded node labels and relationship type strings in db_dal.py with Labels/Edges constants - Remove is_temp guard and its dependency on Labels.TEMP_SCHEMA in update_diff_from_existing_schema Made-with: Cursor
…are_embedding_text sibling - Remove the population/ wrapper directory; graph/ and populate_data.py now live directly under ingestion/ - Move prepare_for_embedding/prepare_embedding_text.py to ingestion/prepare_embedding_text.py and remove the now-empty subdirectory - Update all ingestion.population.* imports to ingestion.* across the codebase Made-with: Cursor
- Move duckdb and spider2 imports to top-level; remove lazy try/except blocks - Rename prepare_embedding_text.py to embeddings.py; update import in batch.py - Remove is_temp support: drop TEMP_SCHEMA/TABLE/COLUMN labels, params, and DataFrame assignments - Remove label_to_type function and all call sites across schema, node, and utils_dal - Remove include_deleted parameter; hardcode deleted-record filter in all DAL queries - Delete dead tables_dal.py and unused entity_exists_in_graph_insensitive function - Drop Connection constraint from indexes.py (label no longer exists) - Remove NULL AS "created" column from duckdb get_tables query Made-with: Cursor
Relocate setup_spider2.py, spider2_loader.py, and SPIDER2_SETUP.md from connectors/ into a new nemo_retriever/tabular-dev-tools/ folder alongside tests/. Update spider2_loader import to a local sibling import and fix docstring run paths accordingly. Made-with: Cursor
Made-with: Cursor
…ame and remove debug file - Merge fetch_relational_db_for_embedding + neo4j_tables_result_to_embedding_dataframe into a single fetch_tabular_embedding_dataframe in embeddings.py - Move the import to the top of batch.py; simplify call site to check df.empty directly - Delete debug_run_mode_ingest.py (unreferenced debug script) Made-with: Cursor
…n up schema ingestion - Drop account_id from Neo4j uniqueness constraint and index - Delete unused docker-compose.neo4j.yaml - Remove table_type and ordinal_position from DuckDB schema queries - Remove table property diffing and single-node update helper from db_dal - Simplify column diff tracking to data_type and is_nullable only - Add comment to update_properties_in_graph_batch Made-with: Cursor
Drop ordinal_position, default, length, comment, and scale from the column fetch query, keeping only data_type and is_nullable. Made-with: Cursor
…eries Made-with: Cursor
…_time, last_altered, default, length, scale from table/column models Simplifies the schema by keeping only essential fields (created, description for tables; data_type, is_nullable, ordinal_position, description for columns) and renames comment -> description throughout. Made-with: Cursor
The fulltext index was never queried anywhere in the codebase and was also being redundantly re-created on every loop iteration. Made-with: Cursor
…tional_db_data to extract_tabular_db_data Functions actively normalize and coerce DataFrame types rather than just loading, so the new names better reflect their behaviour. Made-with: Cursor
The new name better describes the file's responsibility: writing parsed tabular data as nodes and edges into Neo4j. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Checklist