Skip to content

Conversation

xiedeyantu
Copy link
Member

Which issue does this PR close?

Rationale for this change

It would be automatically add a / to the path if the first one was not found and try again.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the datasource Changes to the datasource crate label Aug 31, 2025
@xiedeyantu
Copy link
Member Author

@alamb My apologies, I have resubmitted this fix. Could you please review it again? Thank you.

@xiedeyantu
Copy link
Member Author

Do you have time to help me take a look? @alamb

@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

Sorry @xiedeyantu -- I have been on vacation. I am looking at this PR now

@alamb alamb mentioned this pull request Sep 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xiedeyantu -- this looks like a great start.

I left some comments about how to improve this PR's code, but in the process of reviewing this PR I ended up making many of the changes locally for your consideration here:

Since this PR makes changes to the core datafusion crate, I thought we should add some additional tests in the datafusion crate. I was writing up how this might look but ended up writing them directly. Please take a look:

Simplify implementation, add tests
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xiedeyantu , this is a nice improvement I think

@alamb
Copy link
Contributor

alamb commented Sep 6, 2025

I also tested the reproducer from #17364 locally with this PR and it works great:

DataFusion CLI v49.0.2
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 2.268 seconds.

> select * from nyc_taxi_rides limit 1;
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| pickup_date | id | vendor_id | pickup_datetime      | dropoff_datetime     | passenger_count | trip_distance | pickup_longitude | pickup_latitude | rate_code_id | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | pickup_location_id | dropoff_location_id | junk1 | junk2 |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| 2009-01-31  | 0  | VTS       | 2009-01-31T14:25:00Z | 2009-01-31T14:42:00Z | 4               | 6.12          | 0.0              | 0.0             |              |                    | 0.0               | 0.0              | CASH         | 16.5        | 0     | 0.0     | 0.0        | 0.0          | 0.0                   | 16.5         | 0                  | 0                   |       |       |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
1 row(s) fetched.
Elapsed 0.579 seconds.

@alamb alamb merged commit 528ed60 into apache:main Sep 6, 2025
28 checks passed
@xiedeyantu
Copy link
Member Author

Thank you for your help and guidance! @alamb

@xiedeyantu xiedeyantu deleted the fix-url branch September 6, 2025 10:49
@alamb
Copy link
Contributor

alamb commented Sep 6, 2025

Thank you for sticking with it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved experience when remote object store URL does not end in / (retry as paths)
2 participants