feat: Make supported hadoop filesystem schemes configurable #2272

wForget · 2025-09-01T11:59:31Z

Which issue does this PR close?

Closes #2271.

Rationale for this change

Currently we prefer to use jvm-based libhdfs to implement native hdfs reader, which means we can support more hadoop file systems. But currently we hardcode to support only hdfs scheme, I want to make the supported hadoop file system schemes configurable.

What changes are included in this PR?

Make supported hadoop filesystem schemes configurable

How are these changes tested?

After patch #2244, the newly added test cases were successfully run

codecov-commenter · 2025-09-01T12:22:26Z

Codecov Report

❌ Patch coverage is 44.44444% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.00%. Comparing base (f09f8af) to head (24eded4).
⚠️ Report is 465 commits behind head on main.

Files with missing lines	Patch %	Lines
...la/org/apache/comet/objectstore/NativeConfig.scala	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2272      +/-   ##
============================================
+ Coverage     56.12%   58.00%   +1.88%     
- Complexity      976     1294     +318     
============================================
  Files           119      147      +28     
  Lines         11743    13388    +1645     
  Branches       2251     2377     +126     
============================================
+ Hits           6591     7766    +1175     
- Misses         4012     4360     +348     
- Partials       1140     1262     +122

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead · 2025-09-02T04:02:10Z

@parthchandra cc

parthchandra · 2025-09-02T16:30:44Z

native/core/src/parquet/parquet_support.rs

-    }
-    .map_err(|e| ExecutionError::GeneralError(e.to_string()))?;
+    let (object_store, object_store_path): (Box<dyn ObjectStore>, Path) =
+        if is_hdfs_scheme(&url, object_store_configs) {


There is a little gotcha here when the scheme is s3a. In s3a's case, we replace s3a with s3 so that we can use the native object store implementation (

datafusion-comet/native/core/src/parquet/parquet_support.rs

Line 373 in 60776f2

if scheme == "s3a" {

). If the user has s3a in the list of hdfs urls because they want to use the hadoop-aws implementation, then they will still end up with the native implementation.

Thanks, I made some adjustments, could you take another look?

parthchandra · 2025-09-02T16:35:56Z

common/src/main/scala/org/apache/comet/objectstore/NativeConfig.scala

@@ -40,6 +41,8 @@ object NativeConfig {
    // Azure Data Lake Storage Gen2 secure configurations (can use both prefixes)
    "abfss" -> Seq("fs.abfss.", "fs.abfs."))

+  val COMET_LIBHDFS_SCHEMES_KEY = "fs.comet.libhdfs.schemes"


Should we make this a Comet conf (i.e add it in CometConf so it is automatically documented)?

Thanks, moved to CometConf.

parthchandra · 2025-09-03T23:26:53Z

common/src/main/scala/org/apache/comet/CometConf.scala

+    conf(s"spark.hadoop.$COMET_LIBHDFS_SCHEMES_KEY")
+      .doc(
+        "Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side accesses " +
+          "via libhdfs, separated by commas.")


nit: perhaps we can mention that this configuration is valid only if comet has been built with the hdfs feature flag enabled.

Thanks, added this description

parthchandra

lgtm

wForget marked this pull request as ready for review September 2, 2025 02:15

parthchandra reviewed Sep 2, 2025

View reviewed changes

wForget mentioned this pull request Sep 3, 2025

Integrate Apache OpenDAL to support more file serivces #2243

Closed

parthchandra reviewed Sep 3, 2025

View reviewed changes

wForget mentioned this pull request Sep 4, 2025

Enable unit test of hdfs feature in gha #2298

Open

parthchandra approved these changes Sep 4, 2025

View reviewed changes

wForget added 4 commits September 8, 2025 14:18

feat: Make supported hadoop filesystem schemes configurable

b5e98fe

address comments

b6b3583

improve desc

164167d

doc

24eded4

wForget force-pushed the COMET-2271 branch from d442a93 to 24eded4 Compare September 8, 2025 06:36

parthchandra merged commit 7620ebc into apache:main Sep 8, 2025
94 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Make supported hadoop filesystem schemes configurable #2272

feat: Make supported hadoop filesystem schemes configurable #2272

Uh oh!

wForget commented Sep 1, 2025

Uh oh!

codecov-commenter commented Sep 1, 2025 •

edited

Loading

Uh oh!

comphead commented Sep 2, 2025

Uh oh!

parthchandra Sep 2, 2025

Uh oh!

wForget Sep 3, 2025

Uh oh!

parthchandra Sep 2, 2025

Uh oh!

wForget Sep 3, 2025

Uh oh!

parthchandra Sep 3, 2025

Uh oh!

wForget Sep 4, 2025

Uh oh!

parthchandra left a comment

Uh oh!

Uh oh!

Uh oh!

feat: Make supported hadoop filesystem schemes configurable #2272

feat: Make supported hadoop filesystem schemes configurable #2272

Uh oh!

Conversation

wForget commented Sep 1, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead commented Sep 2, 2025

Uh oh!

parthchandra Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Sep 1, 2025 •

edited

Loading