-
-
Notifications
You must be signed in to change notification settings - Fork 55
Move crawl and QA logs to new mongo collection #2791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
902a347
to
2ebc399
Compare
qa_run_id=None, | ||
) | ||
|
||
while behavior_logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be possible to do this entirely in mongo, might be faster using $function
and calling JSON.parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it be more complicated/less efficient than it seems at first glance to handle it that way. As far as I can tell we wouldn't be able to write to both mongo collections in a single query, so we'd need to put the crawl document's log lines into memory between queries anyway. At that point, seems like we may as well keep to having a single codepath for parsing/writing log lines so we know everything's consistent.
Co-authored-by: Ilya Kreymer <[email protected]>
Fixes #2765
(This is a necessary part of ensuring mongo documents don't exceed 16MB. Hopefully it's also sufficient but we'll need to see in practice if there are other fields that need to be separated out.)
This PR moves crawl and QA run logs into a separate
crawl_logs
mongo collection.It adds a new backend module (without a distinct API router, but the crawls module is getting quite large and it seemed to make sense to add a separate module for the new mongo collection), as well as a migration to move crawl logs from
Crawl
objects into the new collection. The existing nightly test for crawl error logs is fleshed out, and a new nightly test added for behavior logs.The migration has been tested locally. I've also verified that the new collection's indices are used by the existing crawl error and behavior log endpoints.