Skip to content

Fix schema_adapter integration tests not running #16835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jul 27, 2025

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Jul 21, 2025

Which issue does this PR close?

Rationale for this change

This PR restructures and refactors the schema adapter integration tests to improve maintainability, clarity, and test isolation. It separates the test logic into a dedicated schema_adapter module under the integration_tests directory, aligning with other modular test patterns in the codebase.

What changes are included in this PR?

  • Removed schema_adapter_integration_tests.rs from the integration_tests directory.
  • Created a new module schema_adapter and moved the tests there.
  • Added mod schema_adapter; to core_integration.rs to include the new module.
  • Enhanced the schema adapter test suite to:
    • Write and read test data using InMemory object store.
    • Validate consistent behavior of the UppercaseAdapterFactory across ParquetSource, ArrowSource, CsvSource, and JsonSource.
    • Confirm schema mapping behavior and adapter output schemas.
  • Added missing use imports and corrected adapter error handling in existing test files.

Are these changes tested?

✅ Yes, this PR includes comprehensive unit and integration tests for:

  • Adapter correctness and schema transformation behavior.
  • Reusability of SchemaAdapterFactory across file sources.
  • Compatibility with object stores and batch collection.

Are there any user-facing changes?

No, these changes are internal to the testing framework. There are no user-facing changes or breaking API changes introduced in this PR.

kosiew added 22 commits July 21, 2025 11:49
…ma adaptation

- Updated SchemaAdapterFactory create method signature to accept projected and table schema refs.
- Implemented map_column_index and map_schema methods in UppercaseAdapter to support case-insensitive column name mapping and schema projection.
- Added UppercaseSchemaMapper to handle the mapping of RecordBatch columns and column statistics according to the projection.
- Refactored adapt and output_schema methods accordingly.
- This enables correct schema and data mapping for adapters that change column names (e.g., to uppercase) in integration tests.
…erFactory, TestSchemaAdapter, and TestSchemaMapping in schema adapter integration tests.
…_tests.rs file and consolidating struct and implementation blocks for TestSchemaAdapterFactory, TestSchemaAdapter, and TestSchemaMapping. Update imports and adjust test configurations for ParquetSource and CsvSource.
relocate schema adapter tests into the parquet suite
reference new location in schema.rs
remove old schema_adaptation tests
Deleted the outdated end-to-end schema test file `schema.rs` from core tests, as schema adaptation tests have been moved to `parquet/schema_adapter.rs`.
…SchemaAdapterFactory for equality comparison
@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels Jul 21, 2025
@github-actions github-actions bot removed the datasource Changes to the datasource crate label Jul 25, 2025
@kosiew kosiew force-pushed the integration-16801 branch from be97092 to 37f75e9 Compare July 25, 2025 02:47
alamb
alamb previously approved these changes Jul 25, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kosiew

#[derive(Debug, PartialEq)]
struct UppercaseAdapterFactory {}

impl SchemaAdapterFactory for UppercaseAdapterFactory {
Copy link
Contributor

@alamb alamb Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the tests here sort of implies they are only related to parquet -- don't we apply schema adapter to other formats too?

However, since all the tests use parquet this seems like a good place to put them

Update: they don't all use parquet

}

#[tokio::test]
async fn test_multi_source_schema_adapter_reuse() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I missed this one before -- given this is testing formats other than parquet, I think we should move it back into core_integration.

Here is a suggestion how: #16801 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them to
datafusion/core/tests/integration_tests/schema_adapter/schema_adapter_integration_tests.rs

@alamb alamb dismissed their stale review July 25, 2025 19:39

I think we should have a different approach

kosiew and others added 9 commits July 27, 2025 15:50
move integration tests from parquet/schema_adapter.rs
add new integration_tests/schema_adapter module
add root driver schema_adapter_integration.rs
- Moved existing schema adapter integration tests from `schema_adaptation/schema_adapter_integration_tests.rs` to a new module in `datafusion/core/tests/integration_tests/schema_adapter/schema_adapter_integration_tests.rs`.
- Created a new file `schema_adapter.rs` in the integration tests folder to run and organize the tests under the schema adapter directory.
- The tests validate the functionality of a schema adapter that transforms column names to uppercase, ensuring compatibility across different file sources.
- Ensured proper organization of tests for future maintainability and clearer directory structure.
@@ -0,0 +1,21 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this effectively means we will have a new integration test binary that gets run like

cargo test --test schema_adapter_integration

each test binary takes up significant space, and in the past we had problems with the runners disk space filling up

IN this case, the new binary takes 188MB on my machine, so it probably would add the same to most CI runs:

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ du -s -h target/debug/deps/schema_adapter_integration-2b9fa3c8791a7c77
188M	target/debug/deps/schema_adapter_integration-2b9fa3c8791a7c77

Here is a proposed PR to add it to the existing core_integration binary, so it would get run like this:

cargo test --test core_integration -- schema_adapter

And not add a new binary

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kosiew

I think this PR does run the test now, so I think it could be merged as is

However, I think it is worth considering using an exisitng test binary rather than making a new one, namely this PR:

@alamb alamb changed the title Fix integration tests not running Fix schema_adapter integration tests not running Jul 27, 2025
@alamb alamb added the development-process Related to development process of DataFusion label Jul 27, 2025
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Jul 27, 2025
@kosiew kosiew merged commit ff777ea into apache:main Jul 27, 2025
28 checks passed
@kosiew
Copy link
Contributor Author

kosiew commented Jul 27, 2025

Thanks @alamb , @findepi for your reviews.

@alamb
Copy link
Contributor

alamb commented Jul 27, 2025

Thank you for sticking with this @kosiew

adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
- Removed `schema_adapter_integration_tests.rs` from the `integration_tests` directory.
- Created a new module `schema_adapter` and moved the tests there.
- Added `mod schema_adapter;` to `core_integration.rs` to include the new module.
- Enhanced the schema adapter test suite to:
  - Write and read test data using `InMemory` object store.
  - Validate consistent behavior of the `UppercaseAdapterFactory` across `ParquetSource`, `ArrowSource`, `CsvSource`, and `JsonSource`.
  - Confirm schema mapping behavior and adapter output schemas.
- Added missing `use` imports and corrected adapter error handling in existing test files.
Standing-Man pushed a commit to Standing-Man/datafusion that referenced this pull request Aug 4, 2025
- Removed `schema_adapter_integration_tests.rs` from the `integration_tests` directory.
- Created a new module `schema_adapter` and moved the tests there.
- Added `mod schema_adapter;` to `core_integration.rs` to include the new module.
- Enhanced the schema adapter test suite to:
  - Write and read test data using `InMemory` object store.
  - Validate consistent behavior of the `UppercaseAdapterFactory` across `ParquetSource`, `ArrowSource`, `CsvSource`, and `JsonSource`.
  - Confirm schema mapping behavior and adapter output schemas.
- Added missing `use` imports and corrected adapter error handling in existing test files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema adapter Integration tests are not being run
3 participants