Skip to content

Conversation

NaderAlAwar
Copy link
Contributor

Description

closes #5496

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested a review from a team as a code owner October 1, 2025 14:56
@github-project-automation github-project-automation bot moved this to Todo in CCCL Oct 1, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Oct 1, 2025
@NaderAlAwar
Copy link
Contributor Author

@tpn could you verify that this builds on windows? Thanks!

@tpn
Copy link
Contributor

tpn commented Oct 1, 2025

@tpn could you verify that this builds on windows? Thanks!

Verified it builds on Windows!

size_t num_lto_opts)
{
cccl_op_t selector_op{};
selector_state_t* selector_op_state = new selector_state_t{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to wrap this allocation in unique_ptr to ensure it is deallocated on exception, and release the ownership just before return selector_op;:

    auto selector_op_state = std::make_unique<selectro_state_t>();
    <....>
    
    selector_op_state->initialize(offset, begin_offset_iterator, end_offset_iterator);
    selector_op.state = selector_op_state.release();
    
    return selector_op;

Comment on lines +205 to +206
const {2} begin = static_cast<const {2}*>(st->begin_offsets)[st->base_segment_offset + sid];
const {3} end = static_cast<const {3}*>(st->end_offsets)[st->base_segment_offset + sid];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should st->base_segment_offset + sid be saved to a variable to avoid re-computation or to simplify compiler's life.

indirect_arg_t SmallSegmentsSelector(
OffsetT offset, indirect_iterator_t begin_offset_iterator, indirect_iterator_t end_offset_iterator) const
{
static_cast<selector_state_t*>(build.small_segments_selector_op.state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: static_cast<selector_state_t*>(ptr) is used here, but reinterpret_cast<selector_state_t*>(selector.ptr); in the body of SetSegmentOffset method.

Should it be reinterpret_cast everywhere, or static_cast everywhere?

Comment on lines +286 to +288
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT";

constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT";
constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*";
static constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT";
static constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*";

Comment on lines +300 to +313
constexpr std::string_view input_it_t =
"thrust::counting_iterator<cub::detail::segmented_sort::local_segment_index_t>";
constexpr std::string_view first_out_it_t = "cub::detail::segmented_sort::local_segment_index_t*";
constexpr std::string_view second_out_it_t = "cub::detail::segmented_sort::local_segment_index_t*";
constexpr std::string_view unselected_out_it_t =
"thrust::reverse_iterator<cub::detail::segmented_sort::local_segment_index_t*>";
constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*";
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT";
std::string offset_t;
check(nvrtcGetTypeName<OffsetT>(&offset_t));

constexpr std::string_view per_partition_offset_t = "cub::detail::three_way_partition::per_partition_offset_t";
constexpr std::string_view streaming_context_t =
"cub::detail::three_way_partition::streaming_context_t<cub::detail::segmented_sort::global_segment_offset_t>";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use static constexpr std::string_view.

const std::string& three_way_partition_policy_str, const std::string& delay_constructor_type)
{
// Insert before the final closing of the struct (right before the sequence "};")
const std::string needle = "};";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::string needle = "};";
static constexpr std::string_view needle = "};";

segmented_sort::inject_delay_constructor_into_three_way_policy(
three_way_partition_policy_str, three_way_partition_policy_delay_constructor);

constexpr std::string_view program_preamble_template = R"XXX(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr std::string_view program_preamble_template = R"XXX(
static constexpr std::string_view program_preamble_template = R"XXX(

Comment on lines +1013 to +1021
delete static_cast<segmented_sort::selector_state_t*>(build_ptr->large_segments_selector_op.state);
delete static_cast<segmented_sort::selector_state_t*>(build_ptr->small_segments_selector_op.state);

delete[] const_cast<char*>(build_ptr->large_segments_selector_op.code);
delete[] const_cast<char*>(build_ptr->small_segments_selector_op.code);

// Clean up the runtime policies
delete static_cast<segmented_sort::segmented_sort_runtime_tuning_policy*>(build_ptr->runtime_policy);
delete static_cast<segmented_sort::partition_runtime_tuning_policy*>(build_ptr->partition_runtime_policy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these deletion steps be also deferred until end scope by using unique_ptr the same way as it is handled for cubin?


indirect_arg_t(cccl_op_t& op)
: ptr(op.type == cccl_op_kind_t::CCCL_STATEFUL ? op.state : this)
: ptr(op.type == cccl_op_kind_t::CCCL_STATEFUL ? op.state : &op)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: was this change necessary because you ran into a problem with current implementation? The trouble with using &op is that that address may become invalid one cccl_op_t struct passed by reference is destroyed. Using this the address stored in ptr is always valid, but not useful.

Copy link
Contributor

github-actions bot commented Oct 1, 2025

😬 CI Workflow Results

🟥 Finished in 6h 09m: Pass: 87%/32 | Total: 1d 04h | Max: 6h 00m | Hits: 98%/352

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Implement cccl.c.parallel version of segmented_sort

5 participants