Skip to content

Conversation

bjarthur
Copy link
Member

fixes #273

first pass was using claude, and is contained in the first commit. second commit has changes i made myself. transcript from my session with claude follows:

please generate a PR to fix this issue as suggested: http://github.com/JuliaIO/DiskArrays.jl/issues/273

I'll help you create a PR to fix the issue. Let me first fetch the GitHub issue to understand what needs to be fixed.I notice the GitHub URL seems to be truncated or not loading the full issue content. Let me try a different approach by searching for this specific issue.Let me try to access the specific GitHub issue URL directly:Let me search more specifically for issue 273:I'm having trouble accessing the specific issue #273. Could you please provide more details about what the issue says and what fix is suggested? This would help me create the appropriate PR.

Alternatively, if you have access to the issue content, could you share the key details such as:

  1. What is the current problem with threaded algorithms in DiskArrays.jl?
  2. What specific fix is being suggested?
  3. Which files or functions need to be modified?

Once I have these details, I can help you create a comprehensive PR with the necessary code changes, tests, and documentation updates.

here is the content of that issue. <SNIP>

Based on the GitHub issue content, I can see that bjarthur has created a threaded version of a unique method for DiskArrays and rafaqz has suggested implementing a threading trait system. Let me create a comprehensive PR that addresses the suggested solution.I've created a comprehensive PR that addresses the threading support request from issue #273. Here's what this PR includes:

Key Features:

  1. Threading Trait System: A type-safe way for backends to declare thread-safety via ThreadSafe() and NotThreadSafe() traits

  2. Global Threading Control: Similar to allowscalar, provides enable_threading()/disable_threading() functions to control threading globally

  3. Threaded unique Algorithm: Implements bjarthur's suggested threaded unique method with proper fallbacks

  4. Safety First Approach: Defaults to NotThreadSafe() for all backends unless explicitly overridden

  5. Comprehensive Testing: Includes mock thread-safe arrays and tests for all functionality

  6. Documentation: Complete guide for both users and backend developers

How it works:

  • Backends opt into threading by overriding threading_trait(::Type{MyArray}) = ThreadSafe()
  • Algorithms check should_use_threading(array) which verifies both global setting and backend support
  • Falls back gracefully to single-threaded versions when needed
  • Uses Threads.@threads :greedy for optimal performance as suggested in the original issue

This implementation directly addresses rafaqz's suggestions while incorporating bjarthur's threaded unique algorithm. The trait system makes it safe and extensible for future threaded algorithms.

Would you like me to modify any part of this PR or add additional features?

@bjarthur bjarthur requested a review from rafaqz August 28, 2025 15:14
Copy link
Collaborator

@rafaqz rafaqz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit wordy, Claude can really ramble on

src/threading.jl Outdated
Determine if threading should be used for a given DiskArray.
Returns true only if both global threading is enabled AND the backend is thread-safe.
"""
should_use_threading(x) = Val(threading_enabled() && is_thread_safe(x))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return the trait singleton here? Val{true} isn't readable there is no indication what true means

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the traits are currently ThreadSafe and NotThreadSafe. these describe a DiskArray, which is related but distinctly separate from whether a threaded algorithm should be used. because you can opt to not do so even if the DiskArray is thread safe using enable_threading(false).

agreed that it the call signature to the 3-arg method of unique is not readable given the boolean. what would you suggest specifically instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, those are made up by an LLM? ;)

What are some words that describe both traits?

Off the top of my head, Threaded and NotThreaded are more general. And maybe threadsafe is too specific anyway: some backends may be threadsafe but still not efficient to use threads with. As this is proposed as a default behavior the question is really "should threading be used", rather than "is it safe to use threading"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new commit now adds a new set of traits to specify whether methods use threads or not, separately from whether backends support them.


Currently supported threaded algorithms:

### unique
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need more than one application to be worth adding this concept

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more would be nice, but unique is all i need for now. i was hoping to just put the infrastructure in place so that others could chip in going forward.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are these others you speak of ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most likely outcome is no one adds anything for years, and we have all this documentation and code just for unique, that hardly anyone uses. Eventually @meggart and I will have to maintain it, and to me its doubtful if thats worth having.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a second commit adds a threaded version of count. 4-5x faster in my hands.

i use these daily. would these or other threaded methods really not speed up your workflow?

@felixcremer
Copy link
Contributor

I am a bit unsure, whether this belongs into DiskArrays. There is also DiskArrayEngine which aims to speed up computations on DiskArrays.
For me DiskArrays is an interface package which provides the interface to access the chunking structure and the readblock! and writeblock! functions but then the heavy lifting would be done in other packages.

@bjarthur
Copy link
Member Author

bjarthur commented Sep 3, 2025

happy to move this to a separate package. how would you suggest i architect the dispatch if i do so?

right now this PR defines:

Base.unique(v::AbstractDiskArray) = unique(should_use_threading(v), identity, v)
Base.unique(::Type{SingleThreaded}, f, v::AbstractDiskArray) = ...
Base.unique(::Type{MultiThreaded}, f, v::AbstractDiskArray) = ...

where the SingleThreaded methods is just a rename of what's in DiskArrays.jl now.

if i put the MultiThreaded method in, say, ThreadedDiskArrays.jl, i don't see a way to use a trait unless DiskArrays.jl is also somehow modified.

if instead i create a new type, say ThreadedDiskArray <: AbstractDiskArray, wouldn't Zarr.jl need to be modified to subtype this?

the dumb way claude suggested is to define a new function (unique_threaded) and not use dispatch at all. would be nice to be more julian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

threaded algorithms?
3 participants