-
Notifications
You must be signed in to change notification settings - Fork 22
threaded unique #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
threaded unique #274
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bit wordy, Claude can really ramble on
src/threading.jl
Outdated
Determine if threading should be used for a given DiskArray. | ||
Returns true only if both global threading is enabled AND the backend is thread-safe. | ||
""" | ||
should_use_threading(x) = Val(threading_enabled() && is_thread_safe(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return the trait singleton here? Val{true}
isn't readable there is no indication what true means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the traits are currently ThreadSafe
and NotThreadSafe
. these describe a DiskArray, which is related but distinctly separate from whether a threaded algorithm should be used. because you can opt to not do so even if the DiskArray is thread safe using enable_threading(false)
.
agreed that it the call signature to the 3-arg method of unique
is not readable given the boolean. what would you suggest specifically instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, those are made up by an LLM? ;)
What are some words that describe both traits?
Off the top of my head, Threaded and NotThreaded are more general. And maybe threadsafe is too specific anyway: some backends may be threadsafe but still not efficient to use threads with. As this is proposed as a default behavior the question is really "should threading be used", rather than "is it safe to use threading"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a new commit now adds a new set of traits to specify whether methods use threads or not, separately from whether backends support them.
|
||
Currently supported threaded algorithms: | ||
|
||
### unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need more than one application to be worth adding this concept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more would be nice, but unique
is all i need for now. i was hoping to just put the infrastructure in place so that others could chip in going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who are these others you speak of ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most likely outcome is no one adds anything for years, and we have all this documentation and code just for unique
, that hardly anyone uses. Eventually @meggart and I will have to maintain it, and to me its doubtful if thats worth having.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a second commit adds a threaded version of count
. 4-5x faster in my hands.
i use these daily. would these or other threaded methods really not speed up your workflow?
I am a bit unsure, whether this belongs into DiskArrays. There is also DiskArrayEngine which aims to speed up computations on DiskArrays. |
happy to move this to a separate package. how would you suggest i architect the dispatch if i do so? right now this PR defines:
where the SingleThreaded methods is just a rename of what's in DiskArrays.jl now. if i put the MultiThreaded method in, say, ThreadedDiskArrays.jl, i don't see a way to use a trait unless DiskArrays.jl is also somehow modified. if instead i create a new type, say ThreadedDiskArray <: AbstractDiskArray, wouldn't Zarr.jl need to be modified to subtype this? the dumb way claude suggested is to define a new function (unique_threaded) and not use dispatch at all. would be nice to be more julian. |
fixes #273
first pass was using claude, and is contained in the first commit. second commit has changes i made myself. transcript from my session with claude follows:
please generate a PR to fix this issue as suggested: http://github.com/JuliaIO/DiskArrays.jl/issues/273
I'll help you create a PR to fix the issue. Let me first fetch the GitHub issue to understand what needs to be fixed.I notice the GitHub URL seems to be truncated or not loading the full issue content. Let me try a different approach by searching for this specific issue.Let me try to access the specific GitHub issue URL directly:Let me search more specifically for issue 273:I'm having trouble accessing the specific issue #273. Could you please provide more details about what the issue says and what fix is suggested? This would help me create the appropriate PR.
Alternatively, if you have access to the issue content, could you share the key details such as:
Once I have these details, I can help you create a comprehensive PR with the necessary code changes, tests, and documentation updates.
here is the content of that issue. <SNIP>
Based on the GitHub issue content, I can see that bjarthur has created a threaded version of a
unique
method for DiskArrays and rafaqz has suggested implementing a threading trait system. Let me create a comprehensive PR that addresses the suggested solution.I've created a comprehensive PR that addresses the threading support request from issue #273. Here's what this PR includes:Key Features:
Threading Trait System: A type-safe way for backends to declare thread-safety via
ThreadSafe()
andNotThreadSafe()
traitsGlobal Threading Control: Similar to
allowscalar
, providesenable_threading()
/disable_threading()
functions to control threading globallyThreaded
unique
Algorithm: Implements bjarthur's suggested threaded unique method with proper fallbacksSafety First Approach: Defaults to
NotThreadSafe()
for all backends unless explicitly overriddenComprehensive Testing: Includes mock thread-safe arrays and tests for all functionality
Documentation: Complete guide for both users and backend developers
How it works:
threading_trait(::Type{MyArray}) = ThreadSafe()
should_use_threading(array)
which verifies both global setting and backend supportThreads.@threads :greedy
for optimal performance as suggested in the original issueThis implementation directly addresses rafaqz's suggestions while incorporating bjarthur's threaded unique algorithm. The trait system makes it safe and extensible for future threaded algorithms.
Would you like me to modify any part of this PR or add additional features?