Skip to content

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Closes #2552 .

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@comphead
Copy link
Contributor Author

Depends on #2586

}
}

test("test concat function - strings") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concat supports other types as well:

private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)

Copy link
Contributor Author

@comphead comphead Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove this PR for string only
ArrayType waits for apache/datafusion#18020

I dont see binary type support though https://spark.apache.org/docs/latest/api/sql/#concat

UPD: Binary it is probably a specific case of array concat

scala> spark.sql("select concat(to_binary('abc'), to_binary('def'))").show(false)
+--------------------------------------+
|concat(to_binary(abc), to_binary(def))|
+--------------------------------------+
|[0A BC 0D EF]                         |
+--------------------------------------+

I'll check this

@comphead comphead changed the title feat: support concat feat: support concat for strings Oct 26, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.17%. Comparing base (f09f8af) to head (5859d59).
⚠️ Report is 643 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2604      +/-   ##
============================================
+ Coverage     56.12%   59.17%   +3.04%     
- Complexity      976     1447     +471     
============================================
  Files           119      147      +28     
  Lines         11743    13744    +2001     
  Branches       2251     2360     +109     
============================================
+ Hits           6591     8133    +1542     
- Misses         4012     4388     +376     
- Partials       1140     1223      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead comphead requested a review from andygrove October 27, 2025 19:31
@comphead
Copy link
Contributor Author

comphead commented Oct 27, 2025

@andygrove please take another look.
concat works with strings, and for other datatypes it is being fixed in DataFusion apache/datafusion#18020

@comphead comphead marked this pull request as ready for review October 27, 2025 20:51
classOf[BitLength] -> CometScalarFunction("bit_length"),
classOf[Chr] -> CometScalarFunction("char"),
classOf[ConcatWs] -> CometScalarFunction("concat_ws"),
classOf[Concat] -> CometScalarFunction("concat"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need type checks so that we fall back to Spark for unsupported argument types?

Perhaps something like this?

object CometConcat extends CometScalarFunction[Concat]("concat") {
  override def getSupportLevel(expr: Concat): SupportLevel = {
    if (expr.children.forall(_.dataType == DataTypes.StringType)) {
      Compatible()
    } else {
      Incompatible(Some("Only string arguments are supported"))
    }
  }
}

Comment on lines +140 to +143
createFunctionWithInputTypes(
"concat",
Seq(SparkStringType, SparkStringType)
), // TODO: variadic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this PR is just to support string inputs in Comet concat, but the fuzz tester should ideally test for all types that Spark supports

+- CometHashAggregate (67)
+- CometExpand (66)
+- CometUnion (65)
:- CometHashAggregate (22)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hash aggregate now supported in Comet? I don't see concat used in the query.

https://github.com/apache/datafusion-benchmarks/blob/main/tpcds/queries-spark/q5.sql

Comment on lines +3160 to +3161
// https://github.com/apache/datafusion-comet/issues/2647
ignore("test concat function - arrays") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you enable these tests and use the recently added checkSparkAnswerAndFallbackReason method to make sure we are correctly falling back to Spark?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet cannot accelerate Concat because: concat is not supported

3 participants