Skip to content

Conversation

@michalkrzem
Copy link
Contributor

@michalkrzem michalkrzem commented Dec 31, 2024

Your checklist for this pull request

  • I've read the contributing guideline.
  • I've tested my changes by building and running mquery, and testing changed functionality (if applicable)
  • I've added automated tests for my change (if applicable, optional)
  • I've updated documentation to reflect my change (if applicable)

What is the current behaviour?
We manually add files to the queue.

What is the new behaviour?
We have an API endpoint for adding files to QueuedFile in the database

Test plan
Endpoint api "api/queue/ursadb_id"

Closing issues

fixes #437

@michalkrzem michalkrzem linked an issue Dec 31, 2024 that may be closed by this pull request
src/db.py Outdated
Comment on lines 483 to 491
for file in file_paths:
obj = QueuedFile(
ursadb_id=ursadb_id,
path=file.path,
index_types=file.index_types,
tags=file.tags,
)
session.add(obj)
session.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ostatnio, gdy próbowałem robić inne zadanie i miałem do stworzenia wiele elementów, opłacało się skorzystać z bulk_insert_mappings zamiast dodawać iteracyjnie obiekt po obiekcie: https://stackoverflow.com/questions/34057006/using-bulk-insert-mappings, ewentualnie jakakolwiek inna operacja do wstawiania wielu modeli na raz.

Plus wydaje mi się, że obecnie po prostu iteracyjnie nadpisujesz tylko zmienną obj, po czym dodajesz i commitujesz tylko ostatnią instancję.

Copy link
Contributor Author

@michalkrzem michalkrzem Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o, faktycznie - jeszcze jedna tabulacja

@app.delete(
"/api/query/{job_id}",
response_model=StatusSchema,
dependencies=[Depends(can_manage_queries)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental change?

src/db.py Outdated
alembic_cfg = Config(str(config_file))
command.upgrade(alembic_cfg, "head")

def set_queued_file(self, ursadb_id, file_paths):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_queued_file maybe, or enqueue_file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also ursadb: str, file_paths: list[str] guessing by variable names, but I didn't check the types)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or List[FilePathInputSchema] actually I guess

src/schema.py Outdated

class FilePathInputSchema(BaseModel):
path: str
index_types: List[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate the index_types somehow. I thought there's an enum for that, but apparently not. I think the easiest solution is to add a validator checking that every element is in ["gram3", "text4", "hash4", "wide8"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can i use List[Literal["gram3", "text4", "hash4", "wide8"]] from typing?

Copy link
Contributor

@msm-code msm-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll just finish testing it and merge

src/app.py Outdated
tags=["queue"],
dependencies=[Depends(can_manage_queues)],
)
def delete_queued_by_ursadb_id(ursadb_id: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def delete_queued_by_ursadb_id(ursadb_id: str):
def delete_queue_by_id(ursadb_id: str):

src/app.py Outdated
Comment on lines 651 to 659
ursadb_exist = db.exist_ursadb(ursadb_id)
if ursadb_exist:
db.delete_queued_file(ursadb_id)
return StatusSchema(status="ok")

raise HTTPException(
status_code=400,
detail="Ursadb_id not found.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method can just do nothing if there are no files in a given queue (i.e. no error)

src/app.py Outdated
Comment on lines 638 to 641
raise HTTPException(
status_code=400,
detail="Ursadb_id not found.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should raise an exception if there are no files in the queue. Unfortunately, this means that oldest_file and newest_file must be nullable (they won't have a value if a queue is empty)

src/app.py Outdated
Comment on lines 623 to 624
oldest_file = min(queue_files, key=lambda x: x.created_at)
newest_file = max(queue_files, key=lambda x: x.created_at)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use all the files, this should be done in the database (if there are many files this will be unnecessarily slow).

But we can discuss what's needed here

@msm-cert
Copy link
Member

Thanks for fixing. I finally got around to re-testing it, sorry for the wait. Looks good (no need to mark it as a draft, in the future).

@msm-cert msm-cert marked this pull request as ready for review February 25, 2025 03:37
Copy link
Member

@msm-cert msm-cert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice now, thanks

@msm-cert msm-cert merged commit 4c8c07f into master Feb 25, 2025
10 checks passed
@msm-cert msm-cert deleted the 437-implement-api-to-manage-indexing-queues branch February 25, 2025 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement API to manage indexing queues

5 participants