-
Notifications
You must be signed in to change notification settings - Fork 233
Open
Description
Hi!
Just made a PR to add Comet to ClickBench - one of the popular benchmarks for analytical workloads. I've decided to create an issue similar to #391. You may close it if you find it irrelevant.
I'd appreciate feedback on whether my configuration and setup are correct. I consider this important because Comet failed on one query and showed a few curious behaviors I'll outline below. Perhaps, these (and other hidden things) could be fixed with proper configuration.
My notes:
- Predictably, Comet doesn't support some expressions. That's what I got from logs:
>>> grep -P "\[COMET:" log.txt | sed -e 's/^[ \t]*//' | sort | uniq -c
78 +- GlobalLimit [COMET: GlobalLimit is not supported]
18 +- HashAggregate [COMET: Unsupported aggregation mode PartialMerge]
123 +- HashAggregate [COMET: distinct aggregates are not supported]
51 +- Project [COMET: Unsupported cast from LongType to TimestampType with timezone Some(...) and evalMode LEGACY]
126 +- SortAggregate [COMET: SortAggregate is not supported]
43 Execute CreateViewCommand [COMET: Execute CreateViewCommand is not supported]
135 TakeOrderedAndProject [COMET: ]
Unsupported cast from LongType to TimestampType...
thing is something similar to #44 but in this case another column is involved (EventTime
instead of EventDate
). Check this issue also for the additional info.
- Spark's local mode was used. I saw that docs suggest using standalone mode for EC2 but I didn't want to waste some extra resources on separate driver. I looked at Spark UI and seems that Comet works fine.
- Comet's cold-runs are significantly slower than hot-runs. Even compared to Spark.
- As I already mentioned, Comet failed on one query:
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
with error
QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark partitioning: ArrayBuffer(PageViews#1143L DESC NULLS LAST)
Caused by: org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from Utf8 to Dictionary(Int32, Utf8).
Metadata
Metadata
Assignees
Labels
No labels