Skip to content

Improve performance on ClickBench #2035

@Iskander14yo

Description

@Iskander14yo

Hi!

Just made a PR to add Comet to ClickBench - one of the popular benchmarks for analytical workloads. I've decided to create an issue similar to #391. You may close it if you find it irrelevant.

I'd appreciate feedback on whether my configuration and setup are correct. I consider this important because Comet failed on one query and showed a few curious behaviors I'll outline below. Perhaps, these (and other hidden things) could be fixed with proper configuration.

My notes:

  • Predictably, Comet doesn't support some expressions. That's what I got from logs:
>>> grep -P "\[COMET:" log.txt | sed -e 's/^[ \t]*//' | sort | uniq -c

     78 +-  GlobalLimit [COMET: GlobalLimit is not supported]
     18 +-  HashAggregate [COMET: Unsupported aggregation mode PartialMerge]
    123 +-  HashAggregate [COMET: distinct aggregates are not supported]
     51 +-  Project [COMET: Unsupported cast from LongType to TimestampType with timezone Some(...) and evalMode LEGACY]
    126 +-  SortAggregate [COMET: SortAggregate is not supported]
     43 Execute CreateViewCommand [COMET: Execute CreateViewCommand is not supported]
    135 TakeOrderedAndProject [COMET: ]

Unsupported cast from LongType to TimestampType... thing is something similar to #44 but in this case another column is involved (EventTime instead of EventDate). Check this issue also for the additional info.

  • Spark's local mode was used. I saw that docs suggest using standalone mode for EC2 but I didn't want to waste some extra resources on separate driver. I looked at Spark UI and seems that Comet works fine.
  • Comet's cold-runs are significantly slower than hot-runs. Even compared to Spark.
  • As I already mentioned, Comet failed on one query:
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;

with error

QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark partitioning: ArrayBuffer(PageViews#1143L DESC NULLS LAST)

Caused by: org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from Utf8 to Dictionary(Int32, Utf8).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions