Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #58770

```sql
select
  count(*)
from
(
    select
      brand_id,
      class_id,
      category_id
    from
    (
        SELECT
          iss.i_brand_id brand_id,
          iss.i_class_id class_id,
          iss.i_category_id category_id
        FROM
          store_sales,
          item iss,
          date_dim d1
        WHERE
          (ss_item_sk = iss.i_item_sk)
          AND (ss_sold_date_sk = d1.d_date_sk)
          AND (
            d1.d_year BETWEEN 1999
            AND (1999 + 2)
          )
      )tmp
    group by
      brand_id,
      class_id,
      category_id
  )tmp2;
```

before:
<img width="904" height="262" alt="QQ_1764934227381"
src="https://github.com/user-attachments/assets/771a51d7-049d-49a0-a4af-eab318047c2d"
/>
after:
<img width="808" height="250" alt="QQ_1764934235361"
src="https://github.com/user-attachments/assets/56ea2e41-04d4-4cd7-a3a9-3c1f8eab596c"
/>


This pull request adds support for new fixed-width hash key types,
specifically `UInt96` and `UInt104`, across the aggregation, join, set,
partition, and dictionary hash map utilities in the codebase. The
changes ensure that these new types are fully integrated into the
relevant data structures, hash functions, and test coverage, improving
flexibility and performance for scenarios that require these key sizes.

**Support for new hash key types (`UInt96` and `UInt104`):**

* Added new struct definitions for `UInt96` and `UInt104` in
`uint128.h`, including equality operators.
* Updated the `HashKeyType` enum and `get_hash_key_type_with_fixed`
function to include `fixed96` and `fixed104` options.
[[1]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144R41-R42)
[[2]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144R67-R70)
* Implemented `HashCRC32` specializations for `UInt96` and `UInt104` to
enable CRC32 hashing for these types.

**Integration into aggregation, set, join, partition, and dictionary
utilities:**

* Extended the variant types and initialization logic in aggregation
(`agg_utils.h`), distinct aggregation (`distinct_agg_utils.h`), set
(`set_utils.h`), join (`join_utils.h`), partition sort
(`partition_sort_utils.h`), and dictionary hash map
(`complex_dict_hash_map.h`) utilities to support the new key types.
[[1]](diffhunk://#diff-50d8f62236d4e1f81d52e945edee5377b7b22d52e04128eea2c8b7f679b37254R85-R86)
[[2]](diffhunk://#diff-50d8f62236d4e1f81d52e945edee5377b7b22d52e04128eea2c8b7f679b37254R147-R154)
[[3]](diffhunk://#diff-62ad0a1cb1b62de5393935298725cfd2e9766215bdd7653d84cd1fd5e7f59fe3R109-R110)
[[4]](diffhunk://#diff-62ad0a1cb1b62de5393935298725cfd2e9766215bdd7653d84cd1fd5e7f59fe3R166-R173)
[[5]](diffhunk://#diff-8b095a1e764b3856129d9fd06fb9122a7e9eb16bc5c293d8dcaa4ff841a587edR71-R72)
[[6]](diffhunk://#diff-8b095a1e764b3856129d9fd06fb9122a7e9eb16bc5c293d8dcaa4ff841a587edR115-R122)
[[7]](diffhunk://#diff-66cf4052118abf5abbef2e0d9193df3c35a46f70db35853c5884d56d4118a963R70)
[[8]](diffhunk://#diff-66cf4052118abf5abbef2e0d9193df3c35a46f70db35853c5884d56d4118a963R112-R119)
[[9]](diffhunk://#diff-c557434b23ebbb39ef2851b7926d61af5be4bf8f56b83a92b98f9a574f805a90R144-R145)
[[10]](diffhunk://#diff-c557434b23ebbb39ef2851b7926d61af5be4bf8f56b83a92b98f9a574f805a90R209-R216)
[[11]](diffhunk://#diff-60243aa7720001b0983bd282c74f77c8a8542a9a6fed08d80061c4f25847b650R51)
[[12]](diffhunk://#diff-60243aa7720001b0983bd282c74f77c8a8542a9a6fed08d80061c4f25847b650R91-R97)
* Updated template instantiations and type extraction logic to handle
the new key types in join probe implementation.

**Test coverage:**

* Added test cases to verify initialization and type handling for the
new key types in set and distinct aggregation utilities.
[[1]](diffhunk://#diff-9e0e850ab93037077da8e96f7d72b1d45c40835221ccca205cc20ef571115603R167-R176)
[[2]](diffhunk://#diff-96eb9173d84e4c838fcef6dcef716e5e4519ea678f842b4a512c66bbd2f275b1R100-R107)
@github-actions github-actions bot requested a review from yiguolei as a code owner December 10, 2025 08:04
@Thearas
Copy link
Contributor

Thearas commented Dec 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Dec 10, 2025
@Thearas
Copy link
Contributor

Thearas commented Dec 10, 2025

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 25.76% (17/66) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.18% (18352/34510)
Line Coverage 38.77% (168788/435379)
Region Coverage 33.52% (130455/389231)
Branch Coverage 34.44% (56317/163520)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants