Skip to content

Conversation

killme2008
Copy link
Contributor

@killme2008 killme2008 commented Sep 12, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR migrates additional aggregation tests from DuckDB to GreptimeDB’s suite to expand coverage of aggregation functions.

Remaining issues:

  1. APPROX_DISTINCT does not accept null, date, or timestamp types. feat: support for null, date, and timestamp types in approx_distinct apache/datafusion#17618
  2. corr(2, 2) incorrectly returns 0. fix(agg/corr): return NULL when variance is zero or samples < 2 apache/datafusion#17621
  3. STRING_AGG(x, '|') fails with: “Invalid argument error: column types must match schema types, expected LargeUtf8 but found Utf8 at column index 0.” We convert LargeStringArray to StringVector, but build the final record batch with the original Arrow schema, causing schema validation to fail.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@killme2008 killme2008 requested a review from a team as a code owner September 12, 2025 09:34
@killme2008 killme2008 marked this pull request as draft September 12, 2025 09:34
@github-actions github-actions bot added size/XXL docs-not-required This change does not impact docs. labels Sep 12, 2025
@killme2008 killme2008 requested a review from Copilot September 12, 2025 10:23
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates additional aggregation test cases from DuckDB to GreptimeDB's test suite as part of the ongoing effort to improve test coverage for aggregation functions.

  • Migrates 19 new test files covering various aggregate functions (string_agg, stddev, regression, min_max, median, etc.)
  • Updates one existing result file with an error case for copy database operations
  • Adds minor formatting improvement to one existing test file

Reviewed Changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/cases/standalone/common/copy/copy_database_from_fs_parquet.result Updates test results with table not found errors for split_test table
tests/cases/standalone/common/aggregate/uddsketch.sql Adds blank line for formatting consistency
tests/cases/standalone/common/aggregate/string_agg.sql Migrates STRING_AGG function tests from DuckDB
tests/cases/standalone/common/aggregate/string_agg.result Test results showing column type mismatch errors for STRING_AGG
tests/cases/standalone/common/aggregate/stddev.sql Migrates standard deviation function tests
tests/cases/standalone/common/aggregate/stddev.result Results for stddev_samp, stddev_pop, var_samp, var_pop functions
tests/cases/standalone/common/aggregate/regression.sql Migrates regression analysis function tests
tests/cases/standalone/common/aggregate/regression.result Results for regression slope, intercept, R2, and related functions
tests/cases/standalone/common/aggregate/min_max.sql Migrates MIN/MAX aggregate function tests
tests/cases/standalone/common/aggregate/min_max.result Results for MIN/MAX with various data types
tests/cases/standalone/common/aggregate/median.sql Migrates MEDIAN aggregate function tests
tests/cases/standalone/common/aggregate/median.result Results for median calculations with different data types
tests/cases/standalone/common/aggregate/first_last.sql Migrates FIRST_VALUE and LAST_VALUE function tests
tests/cases/standalone/common/aggregate/first_last.result Results for first/last value functions with ORDER BY
tests/cases/standalone/common/aggregate/covar.sql Migrates covariance function tests
tests/cases/standalone/common/aggregate/covar.result Results for COVAR_POP and COVAR_SAMP functions
tests/cases/standalone/common/aggregate/corr.sql Migrates correlation coefficient function tests
tests/cases/standalone/common/aggregate/corr.result Results for CORR function with various data patterns
tests/cases/standalone/common/aggregate/bool_agg.sql Migrates boolean aggregation function tests
tests/cases/standalone/common/aggregate/bool_agg.result Results for BOOL_AND and BOOL_OR functions
tests/cases/standalone/common/aggregate/bit_operations.sql Migrates bitwise aggregation function tests
tests/cases/standalone/common/aggregate/bit_operations.result Results for BIT_AND, BIT_OR, and BIT_XOR functions
tests/cases/standalone/common/aggregate/avg.sql Migrates AVG aggregate function tests
tests/cases/standalone/common/aggregate/avg.result Results for average calculations with type coercion errors
tests/cases/standalone/common/aggregate/array_agg.sql Migrates array aggregation function tests
tests/cases/standalone/common/aggregate/array_agg.result Results for ARRAY_AGG with ORDER BY and DISTINCT
tests/cases/standalone/common/aggregate/approx_percentile_cont_with_weight.sql Migrates weighted approximate percentile tests
tests/cases/standalone/common/aggregate/approx_percentile_cont_with_weight.result Results for weighted percentile calculations
tests/cases/standalone/common/aggregate/approx_percentile_cont.sql Migrates approximate percentile function tests
tests/cases/standalone/common/aggregate/approx_percentile_cont.result Results for APPROX_PERCENTILE_CONT function
tests/cases/standalone/common/aggregate/approx_median.sql Migrates approximate median function tests
tests/cases/standalone/common/aggregate/approx_median.result Results for APPROX_MEDIAN function
tests/cases/standalone/common/aggregate/approx_distinct.sql Migrates approximate distinct count tests
tests/cases/standalone/common/aggregate/approx_distinct.result Results for APPROX_DISTINCT function

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Dennis Zhuang <[email protected]>
Signed-off-by: Dennis Zhuang <[email protected]>
Signed-off-by: Dennis Zhuang <[email protected]>
@killme2008 killme2008 force-pushed the feature/duckdb-tests-part4 branch from acd30df to a8f1fc0 Compare September 17, 2025 14:13
@github-actions github-actions bot added size/XXL and removed size/L labels Sep 17, 2025
Signed-off-by: Dennis Zhuang <[email protected]>
@killme2008 killme2008 force-pushed the feature/duckdb-tests-part4 branch from f5b6487 to 5b6494a Compare September 18, 2025 03:28
@killme2008 killme2008 requested a review from Copilot September 18, 2025 03:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Dennis Zhuang <[email protected]>
@killme2008 killme2008 marked this pull request as ready for review September 18, 2025 03:47
@killme2008
Copy link
Contributor Author

killme2008 commented Sep 18, 2025

Ready to review.

Remaining issues:

  1. APPROX_DISTINCT does not accept null, date, or timestamp types. feat: support for null, date, and timestamp types in approx_distinct apache/datafusion#17618
  2. corr(2, 2) incorrectly returns 0. fix(agg/corr): return NULL when variance is zero or samples < 2 apache/datafusion#17621
  3. STRING_AGG(x, '|') fails with: Invalid argument error: column types must match schema types, expected LargeUtf8 but found Utf8 at column index 0. We convert LargeStringArray to StringVector, but build the final record batch with the original Arrow schema, causing schema validation to fail. @evenyag We should remove all vector and data type conversions in the read path.

@discord9 @waynexia

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun fengjiachun added this pull request to the merge queue Sep 25, 2025
Merged via the queue into GreptimeTeam:main with commit c6e5552 Sep 25, 2025
54 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs. documentation size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants