-
Notifications
You must be signed in to change notification settings - Fork 76
Release the GIL when building indexes #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Interesting. Could you add a python test that writes to the same tantivy index from 10 threads in a thread pool? You can use |
Oh, and thanks for the submission btw 👍🏼 |
Ok, I'm looking into this in more detail. It turns out that https://docs.rs/pyo3/0.26.0/pyo3/marker/struct.Python.html#method.allow_threads Instead, they offer You can likely show Cursor the docs and ask it to update the code for the new detach method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great addtion thanks. I think we just need to use the new detach()
method rather than the now-deprecated allow_threads
method.
Hey thanks for the review. I’ll make these changes. I added the defensive clones manually because I wasn’t sure about the guarantees we lose when releasing the GIL. I’ll add that test, update to the new methods and remove the clones. Will probably get to it late this evening |
Hi @cjrh, I believe I've made all the changes you've requested. As part of the PyO3 upgrade there were a few other deprecations that I fixed. Additionally, I wrote a concurrency test here: https://gist.github.com/petehunt/eeaabbcf6844c47d601eecfb74e86314, however I couldn't get it to pass with the current release and it doesn't pass in my fork either.
I poked around a bit to try to solve it but I'm afraid I'm a little out of my depth on it. Let me know if you need anything else! |
Ok no worries, I'll have a look at the concurrency test. |
@petehunt You might need writer = index.writer()
for doc_num in range(num_docs_per_thread):
doc = Document()
doc.add_text("id", f"thread_{thread_id}_doc_{doc_num}")
doc.add_text("content", f"Content from thread {thread_id}, document {doc_num}")
doc.add_integer("thread_id", thread_id)
doc.add_integer("doc_number", doc_num)
writer.add_document(doc)
# Commit the changes from this thread
writer.commit()
writer.wait_merging_threads() <-------------- HERE Well, you definitely need |
Actually, I suspect this |
I can take a look at that as well but I suspected it was intended behavior. One thing I tried was releasing the GIL and then acquiring a new per-Index lock, but that didn't resolve the issue either since there were still multiple writers open concurrently. |
I'm going to go ahead and merge. I'll do further testing separately. Thanks again for the contribution 🚀 |
Thank you for maintaining the project! |
We're using tantivy-py in an app that creates a lot of Indexes on the fly. During profiling we noticed that it holds the GIL for a few hundred ms each time, making it difficult to use in an asyncio project. This PR releases the GIL for all methods in
IndexWriter
andIndex
.Code was mostly written by Cursor, but I manually reviewed and manually tested and made some manual edits.
Test plan
nox
tests pass