Skip to content

[Question]: Gaining better control over the insert process? #1580

@WilliamDiakite

Description

@WilliamDiakite

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

The rag.insert(...)takes a text document (or a collection of documents) as an input. From there, LightRAG uses an LLM to extract various information (entities, relationships, etc.). This extraction is tailored by a rather complex prompt which outputs formatted data that is later parsed by LightRAG.

My first question concern the format of the output : why this default particular default format (the one described in prompt.py)? Why not ask the LLM to output something like json or even xml which can be easily handled by machines and humans alike? Not to mention that LLM are trained on such formats (or language in the case of xml). I couldn't find any reason for this design choice running through the paper.

The follow-up question addresses the possibility of interacting with the insert task by providing rag.insert(...) formatted data rather than plain text (using a schema specified by LightRAG). This way, preparation of data could be handled outside LightRAG, allowing easier testing and better control (moreover, having an explicit schema for the extracted data would make it more comfortable to modify the LLM prompt). Is there an approach that already allows that kind of interaction?

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions