-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Fix schema_adapter
integration tests not running
#16835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ma adaptation - Updated SchemaAdapterFactory create method signature to accept projected and table schema refs. - Implemented map_column_index and map_schema methods in UppercaseAdapter to support case-insensitive column name mapping and schema projection. - Added UppercaseSchemaMapper to handle the mapping of RecordBatch columns and column statistics according to the projection. - Refactored adapt and output_schema methods accordingly. - This enables correct schema and data mapping for adapters that change column names (e.g., to uppercase) in integration tests.
…erFactory, TestSchemaAdapter, and TestSchemaMapping in schema adapter integration tests.
…_tests.rs file and consolidating struct and implementation blocks for TestSchemaAdapterFactory, TestSchemaAdapter, and TestSchemaMapping. Update imports and adjust test configurations for ParquetSource and CsvSource.
…o the directory instead of a specific file
relocate schema adapter tests into the parquet suite reference new location in schema.rs remove old schema_adaptation tests
Deleted the outdated end-to-end schema test file `schema.rs` from core tests, as schema adaptation tests have been moved to `parquet/schema_adapter.rs`.
…ry in parquet integration tests
…for Arrow, Parquet, Csv, and Json
…SchemaAdapterFactory for equality comparison
This reverts commit 414de48.
be97092
to
37f75e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kosiew
#[derive(Debug, PartialEq)] | ||
struct UppercaseAdapterFactory {} | ||
|
||
impl SchemaAdapterFactory for UppercaseAdapterFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the tests here sort of implies they are only related to parquet -- don't we apply schema adapter to other formats too?
However, since all the tests use parquet this seems like a good place to put them
Update: they don't all use parquet
} | ||
|
||
#[tokio::test] | ||
async fn test_multi_source_schema_adapter_reuse() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I missed this one before -- given this is testing formats other than parquet, I think we should move it back into core_integration.
Here is a suggestion how: #16801 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved them to
datafusion/core/tests/integration_tests/schema_adapter/schema_adapter_integration_tests.rs
move integration tests from parquet/schema_adapter.rs add new integration_tests/schema_adapter module add root driver schema_adapter_integration.rs
- Moved existing schema adapter integration tests from `schema_adaptation/schema_adapter_integration_tests.rs` to a new module in `datafusion/core/tests/integration_tests/schema_adapter/schema_adapter_integration_tests.rs`. - Created a new file `schema_adapter.rs` in the integration tests folder to run and organize the tests under the schema adapter directory. - The tests validate the functionality of a schema adapter that transforms column names to uppercase, ensuring compatibility across different file sources. - Ensured proper organization of tests for future maintainability and clearer directory structure.
@@ -0,0 +1,21 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this effectively means we will have a new integration test binary that gets run like
cargo test --test schema_adapter_integration
each test binary takes up significant space, and in the past we had problems with the runners disk space filling up
IN this case, the new binary takes 188MB on my machine, so it probably would add the same to most CI runs:
(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ du -s -h target/debug/deps/schema_adapter_integration-2b9fa3c8791a7c77
188M target/debug/deps/schema_adapter_integration-2b9fa3c8791a7c77
Here is a proposed PR to add it to the existing core_integration binary, so it would get run like this:
cargo test --test core_integration -- schema_adapter
And not add a new binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema_adapter
integration tests not running
Thank you for sticking with this @kosiew |
- Removed `schema_adapter_integration_tests.rs` from the `integration_tests` directory. - Created a new module `schema_adapter` and moved the tests there. - Added `mod schema_adapter;` to `core_integration.rs` to include the new module. - Enhanced the schema adapter test suite to: - Write and read test data using `InMemory` object store. - Validate consistent behavior of the `UppercaseAdapterFactory` across `ParquetSource`, `ArrowSource`, `CsvSource`, and `JsonSource`. - Confirm schema mapping behavior and adapter output schemas. - Added missing `use` imports and corrected adapter error handling in existing test files.
- Removed `schema_adapter_integration_tests.rs` from the `integration_tests` directory. - Created a new module `schema_adapter` and moved the tests there. - Added `mod schema_adapter;` to `core_integration.rs` to include the new module. - Enhanced the schema adapter test suite to: - Write and read test data using `InMemory` object store. - Validate consistent behavior of the `UppercaseAdapterFactory` across `ParquetSource`, `ArrowSource`, `CsvSource`, and `JsonSource`. - Confirm schema mapping behavior and adapter output schemas. - Added missing `use` imports and corrected adapter error handling in existing test files.
Which issue does this PR close?
Rationale for this change
This PR restructures and refactors the schema adapter integration tests to improve maintainability, clarity, and test isolation. It separates the test logic into a dedicated
schema_adapter
module under theintegration_tests
directory, aligning with other modular test patterns in the codebase.What changes are included in this PR?
schema_adapter_integration_tests.rs
from theintegration_tests
directory.schema_adapter
and moved the tests there.mod schema_adapter;
tocore_integration.rs
to include the new module.InMemory
object store.UppercaseAdapterFactory
acrossParquetSource
,ArrowSource
,CsvSource
, andJsonSource
.use
imports and corrected adapter error handling in existing test files.Are these changes tested?
✅ Yes, this PR includes comprehensive unit and integration tests for:
SchemaAdapterFactory
across file sources.Are there any user-facing changes?
No, these changes are internal to the testing framework. There are no user-facing changes or breaking API changes introduced in this PR.