-
Notifications
You must be signed in to change notification settings - Fork 283
Add segmented sort implementation for c.parallel #6095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ert change to continuation kernel
…ed in dispatch segmented sort
… policy hub and kernel source
…cl into segmented-sort-c-parallel
@tpn could you verify that this builds on windows? Thanks! |
Verified it builds on Windows! |
size_t num_lto_opts) | ||
{ | ||
cccl_op_t selector_op{}; | ||
selector_state_t* selector_op_state = new selector_state_t{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to wrap this allocation in unique_ptr
to ensure it is deallocated on exception, and release the ownership just before return selector_op;
:
auto selector_op_state = std::make_unique<selectro_state_t>();
<....>
selector_op_state->initialize(offset, begin_offset_iterator, end_offset_iterator);
selector_op.state = selector_op_state.release();
return selector_op;
const {2} begin = static_cast<const {2}*>(st->begin_offsets)[st->base_segment_offset + sid]; | ||
const {3} end = static_cast<const {3}*>(st->end_offsets)[st->base_segment_offset + sid]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should st->base_segment_offset + sid
be saved to a variable to avoid re-computation or to simplify compiler's life.
indirect_arg_t SmallSegmentsSelector( | ||
OffsetT offset, indirect_iterator_t begin_offset_iterator, indirect_iterator_t end_offset_iterator) const | ||
{ | ||
static_cast<selector_state_t*>(build.small_segments_selector_op.state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: static_cast<selector_state_t*>(ptr)
is used here, but reinterpret_cast<selector_state_t*>(selector.ptr);
in the body of SetSegmentOffset
method.
Should it be reinterpret_cast
everywhere, or static_cast
everywhere?
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT"; | ||
|
||
constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT"; | |
constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; | |
static constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT"; | |
static constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; |
constexpr std::string_view input_it_t = | ||
"thrust::counting_iterator<cub::detail::segmented_sort::local_segment_index_t>"; | ||
constexpr std::string_view first_out_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; | ||
constexpr std::string_view second_out_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; | ||
constexpr std::string_view unselected_out_it_t = | ||
"thrust::reverse_iterator<cub::detail::segmented_sort::local_segment_index_t*>"; | ||
constexpr std::string_view num_selected_it_t = "cub::detail::segmented_sort::local_segment_index_t*"; | ||
constexpr std::string_view scan_tile_state_t = "cub::detail::three_way_partition::ScanTileStateT"; | ||
std::string offset_t; | ||
check(nvrtcGetTypeName<OffsetT>(&offset_t)); | ||
|
||
constexpr std::string_view per_partition_offset_t = "cub::detail::three_way_partition::per_partition_offset_t"; | ||
constexpr std::string_view streaming_context_t = | ||
"cub::detail::three_way_partition::streaming_context_t<cub::detail::segmented_sort::global_segment_offset_t>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use static constexpr std::string_view
.
const std::string& three_way_partition_policy_str, const std::string& delay_constructor_type) | ||
{ | ||
// Insert before the final closing of the struct (right before the sequence "};") | ||
const std::string needle = "};"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::string needle = "};"; | |
static constexpr std::string_view needle = "};"; |
segmented_sort::inject_delay_constructor_into_three_way_policy( | ||
three_way_partition_policy_str, three_way_partition_policy_delay_constructor); | ||
|
||
constexpr std::string_view program_preamble_template = R"XXX( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr std::string_view program_preamble_template = R"XXX( | |
static constexpr std::string_view program_preamble_template = R"XXX( |
delete static_cast<segmented_sort::selector_state_t*>(build_ptr->large_segments_selector_op.state); | ||
delete static_cast<segmented_sort::selector_state_t*>(build_ptr->small_segments_selector_op.state); | ||
|
||
delete[] const_cast<char*>(build_ptr->large_segments_selector_op.code); | ||
delete[] const_cast<char*>(build_ptr->small_segments_selector_op.code); | ||
|
||
// Clean up the runtime policies | ||
delete static_cast<segmented_sort::segmented_sort_runtime_tuning_policy*>(build_ptr->runtime_policy); | ||
delete static_cast<segmented_sort::partition_runtime_tuning_policy*>(build_ptr->partition_runtime_policy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these deletion steps be also deferred until end scope by using unique_ptr
the same way as it is handled for cubin
?
|
||
indirect_arg_t(cccl_op_t& op) | ||
: ptr(op.type == cccl_op_kind_t::CCCL_STATEFUL ? op.state : this) | ||
: ptr(op.type == cccl_op_kind_t::CCCL_STATEFUL ? op.state : &op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: was this change necessary because you ran into a problem with current implementation? The trouble with using &op
is that that address may become invalid one cccl_op_t
struct passed by reference is destroyed. Using this
the address stored in ptr
is always valid, but not useful.
😬 CI Workflow Results🟥 Finished in 6h 09m: Pass: 87%/32 | Total: 1d 04h | Max: 6h 00m | Hits: 98%/352See results here. |
Description
closes #5496
Checklist