-
Couldn't load subscription status.
- Fork 246
feat: support concat for strings
#2604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Depends on #2586 |
| } | ||
| } | ||
|
|
||
| test("test concat function - strings") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concat supports other types as well:
private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove this PR for string only
ArrayType waits for apache/datafusion#18020
I dont see binary type support though https://spark.apache.org/docs/latest/api/sql/#concat
UPD: Binary it is probably a specific case of array concat
scala> spark.sql("select concat(to_binary('abc'), to_binary('def'))").show(false)
+--------------------------------------+
|concat(to_binary(abc), to_binary(def))|
+--------------------------------------+
|[0A BC 0D EF] |
+--------------------------------------+
I'll check this
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2604 +/- ##
============================================
+ Coverage 56.12% 59.17% +3.04%
- Complexity 976 1447 +471
============================================
Files 119 147 +28
Lines 11743 13744 +2001
Branches 2251 2360 +109
============================================
+ Hits 6591 8133 +1542
- Misses 4012 4388 +376
- Partials 1140 1223 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@andygrove please take another look. |
| classOf[BitLength] -> CometScalarFunction("bit_length"), | ||
| classOf[Chr] -> CometScalarFunction("char"), | ||
| classOf[ConcatWs] -> CometScalarFunction("concat_ws"), | ||
| classOf[Concat] -> CometScalarFunction("concat"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need type checks so that we fall back to Spark for unsupported argument types?
Perhaps something like this?
object CometConcat extends CometScalarFunction[Concat]("concat") {
override def getSupportLevel(expr: Concat): SupportLevel = {
if (expr.children.forall(_.dataType == DataTypes.StringType)) {
Compatible()
} else {
Incompatible(Some("Only string arguments are supported"))
}
}
}| createFunctionWithInputTypes( | ||
| "concat", | ||
| Seq(SparkStringType, SparkStringType) | ||
| ), // TODO: variadic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this PR is just to support string inputs in Comet concat, but the fuzz tester should ideally test for all types that Spark supports
| +- CometHashAggregate (67) | ||
| +- CometExpand (66) | ||
| +- CometUnion (65) | ||
| :- CometHashAggregate (22) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this hash aggregate now supported in Comet? I don't see concat used in the query.
https://github.com/apache/datafusion-benchmarks/blob/main/tpcds/queries-spark/q5.sql
| // https://github.com/apache/datafusion-comet/issues/2647 | ||
| ignore("test concat function - arrays") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you enable these tests and use the recently added checkSparkAnswerAndFallbackReason method to make sure we are correctly falling back to Spark?
Which issue does this PR close?
Closes #2552 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?