Add coral-benchmark module for cross-dialect translation testing by wmoustafa · Pull Request #599 · linkedin/coral

wmoustafa · 2026-04-15T08:30:51Z

Add coral-benchmark module for cross-dialect translation testing

Summary

Adds a new coral-benchmark module that provides a framework for testing Coral translations end-to-end between any pair of supported dialects (Hive, Spark, Trino). This is the API design — interfaces, data types, and the orchestration layer — ready for implementation to be filled in.

Motivation

Coral's existing tests verify individual dialect converters in isolation (e.g., Hive-to-Trino, Hive-to-Spark). There is no unified way to test arbitrary dialect-to-dialect translations, validate that translated queries are syntactically valid on the target engine, or compare query results across engines for semantic equivalence. This module closes that gap.

Design

The framework is built around two SPIs and three verification levels:

SPIs:

DialectPlugin — wraps existing Coral converters (HiveToRelConverter, TrinoToRelConverter, RelToTrinoConverter, CoralSpark) behind a uniform toRelNode() / toDialectSql() interface.
EnginePlugin — provides execution capabilities (createTable, loadData, explain, execute) for embedded engines (Spark, Trino, Hive).

Verification levels (escalating):

TRANSLATION — source SQL -> Coral IR -> target SQL completes without error.
EXPLAIN — the translated SQL passes the target engine's EXPLAIN (syntax + planning validation).
RESULT_SET — query results from source and target engines are compared for semantic equivalence, with configurable tolerances for floating-point precision, row ordering, NULL handling, and type widening.

Supporting components:

InMemoryCatalog — an in-memory CoralCatalog implementation using the Coral type system (StructType, PrimitiveType, etc.) so tests have no external metastore dependency.
RowSet / ResultSet — typed tabular data containers for loading test data and capturing query results.
ResultSetComparator with ComparisonConfig — configurable comparison logic for cross-engine result equivalence.
TranslationTestSuite — builder-configured orchestrator that reads .sql files from a query directory and runs each through the configured pipeline.
TestReport / QueryTestResult — structured reporting with per-query outcomes and aggregate failure categorization (translation error, explain failure, result mismatch).

What's included

15 Java source files across 5 packages (spi, catalog, data, comparison, suite)
build.gradle with api dependency on :coral-common
The testing spec document (coral-benchmark-spec.md)
Module registered in settings.gradle
Compiles cleanly via ./gradlew :coral-benchmark:compileJava

What's not included (yet)

Concrete DialectPlugin implementations (Hive, Spark, Trino)
Concrete EnginePlugin implementations (embedded Spark, Trino)
ResultSetComparator.compare() implementation
TranslationTestSuite.run() implementation
Query corpus (.sql test files)

Introduces the API design for a benchmark framework that tests Coral translations end-to-end between any supported dialect pair (Hive, Spark, Trino). Includes SPIs for dialect translation and engine execution, in-memory catalog, typed test data, result-set comparison, and a test suite orchestrator with three verification levels.

Remove SKIP and ERROR statuses that had no backing mechanism in the API. FailureCategory already captures the reason for failure (translation error, explain failure, result mismatch).

Remove unused imports and apply eclipse code style formatting.

Arrays.copyOf is used in RowSet.Builder.addRow but spotless incorrectly flagged the import as unused due to a JDK version mismatch.

Replace bare '>' characters in Javadoc comments (from '->' arrows and type mappings) that JDK 8 javadoc rejects as malformed HTML.

ljfgem

Thanks for the PR!

ljfgem · 2026-04-16T14:11:01Z

+   * @return the Calcite {@link RelNode} representing the Coral intermediate representation
+   * @throws IllegalArgumentException if the SQL cannot be parsed in this dialect
+   */
+  RelNode toRelNode(String sql, CoralCatalog catalog);


What about passing catalog at construction or via init(CoralCatalog). The SPI methods simplify to toRelNode(String sql) / toDialectSql(RelNode relNode).

ljfgem · 2026-04-16T14:22:42Z

+
+Values are Java objects matching the Coral type mapping (INT -> Integer, STRING -> String, ARRAY -> List, MAP -> Map, STRUCT -> Object[], etc.).
+
+## 4. Verification Levels


What about adding lightweight ASCII diagrams (that Claude Code can generate) to make things clearer?

ljfgem · 2026-04-16T15:00:32Z

+   * @param sql    a SELECT statement in this dialect's syntax
+   * @param catalog the catalog providing table metadata for query resolution
+   * @return the Calcite {@link RelNode} representing the Coral intermediate representation
+   * @throws IllegalArgumentException if the SQL cannot be parsed in this dialect


The orchestrator in TranslationTestSuite.run() will need to catch different exception types at different stages and map them to FailureCategory. What about defining a BenchmarkException hierarchy (TranslationException, EngineException) that plugins are contracted to throw.

ljfgem · 2026-04-16T15:01:38Z

@@ -0,0 +1,281 @@
+# Coral Benchmark: Cross-Dialect Integration Testing Framework


Let's also add this new module (WIP) to the README.md file?

1fanwang

The overall direction looks good, thanks for continuing this effort!

In-memory catalog, SPI-based dialects, and escalating verification levels (especially RESULT_SET for semantic equivalence against real engines) — I think this is the shape that's been missing on the OSS side for exactly the kinds of subtle cross-engine differences that bite in practice: NULL handling, UNION coercion, timestamp precision. Skeleton-first with a spec doc is a nice way to surface the API before anyone invests in plugin plumbing.

One forward-looking thought: the long-term value of this framework probably lives in the query corpus more than the framework itself — curious if you have a mental model for how the corpus grows over time (in-repo seed vs. per-consumer contributions, which translation-divergence categories to prioritize first). Happy to pick that up in a follow-up.

1fanwang · 2026-04-16T21:42:40Z

+   * @return the Calcite {@link RelNode} representing the Coral intermediate representation
+   * @throws IllegalArgumentException if the SQL cannot be parsed in this dialect
+   */
+  RelNode toRelNode(String sql, CoralCatalog catalog);


Curious how you're thinking about config/context flowing through the SPI. Looking at the existing converters:

RelToTrinoConverter consumes a Map<String, Boolean> of CoralTrinoConfigKeys (e.g. SUPPORT_LEGACY_UNNEST_ARRAY_OF_STRUCT) — exactly the kind of semantic knob a correctness suite would want to toggle per test.

CoralSpark.create threads an HMS client through.

Plugins could own those internally, but then the benchmark caller can't exercise a given converter across its config surface. Wondering if something like DialectPlugin<Ctx> or a small per-plugin config object earns its keep once the first real impls land — or whether you'd rather keep the core SPI narrow and handle the variance via plugin-specific sub-interfaces. No strong view, just flagging before it ossifies.

1fanwang · 2026-04-16T21:42:40Z

+apply plugin: 'java-library'
+
+dependencies {
+  api project(path: ':coral-common')


Small topology thought — today the module only depends on :coral-common, which keeps it dialect-agnostic. §9 of the spec suggests it'll grow deps on coral-hive, coral-trino, coral-spark once the concrete plugins land.

Have you considered hosting each DialectPlugin in its native module (e.g. coral-hive shipping HiveDialectPlugin with a META-INF/services entry, picked up by the ServiceLoader path you've already sketched)? Keeps this module thin, and a future contributor adding a new dialect wouldn't need to touch coral-benchmark. Defer if you've already weighed it — just wanted to surface while the topology is still fluid.

wmoustafa added 5 commits April 15, 2026 01:28

Simplify QueryTestResult.Status to PASS/FAIL only

945d528

Remove SKIP and ERROR statuses that had no backing mechanism in the API. FailureCategory already captures the reason for failure (translation error, explain failure, result mismatch).

Fix spotless formatting violations

45865e8

Remove unused imports and apply eclipse code style formatting.

Restore Arrays import removed by spotless

8e38d26

Arrays.copyOf is used in RowSet.Builder.addRow but spotless incorrectly flagged the import as unused due to a JDK version mismatch.

Fix Javadoc errors for JDK 8 compatibility

d922af9

Replace bare '>' characters in Javadoc comments (from '->' arrows and type mappings) that JDK 8 javadoc rejects as malformed HTML.

ljfgem reviewed Apr 16, 2026

View reviewed changes

1fanwang reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coral-benchmark module for cross-dialect translation testing#599

Add coral-benchmark module for cross-dialect translation testing#599
wmoustafa wants to merge 5 commits intolinkedin:masterfrom
wmoustafa:coral-benachmark

wmoustafa commented Apr 15, 2026

Uh oh!

ljfgem left a comment

Uh oh!

ljfgem Apr 16, 2026

Uh oh!

ljfgem Apr 16, 2026

Uh oh!

ljfgem Apr 16, 2026

Uh oh!

ljfgem Apr 16, 2026

Uh oh!

1fanwang left a comment •

edited

Loading

Uh oh!

1fanwang Apr 16, 2026

Uh oh!

1fanwang Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Values are Java objects matching the Coral type mapping (INT -> Integer, STRING -> String, ARRAY -> List, MAP -> Map, STRUCT -> Object[], etc.).

		## 4. Verification Levels

		@@ -0,0 +1,281 @@
		# Coral Benchmark: Cross-Dialect Integration Testing Framework

Conversation

wmoustafa commented Apr 15, 2026

Add coral-benchmark module for cross-dialect translation testing

Summary

Motivation

Design

What's included

What's not included (yet)

Uh oh!

ljfgem left a comment

Choose a reason for hiding this comment

Uh oh!

ljfgem Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ljfgem Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ljfgem Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ljfgem Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

1fanwang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1fanwang Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

1fanwang Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1fanwang left a comment •

edited

Loading