-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Do you need to ask a question?
- I have searched the existing question and discussions and this question is not already answered.
- I believe this is a legitimate question, not just a bug or feature request.
Your Question
The rag.insert(...)takes a text document (or a collection of documents) as an input. From there, LightRAG uses an LLM to extract various information (entities, relationships, etc.). This extraction is tailored by a rather complex prompt which outputs formatted data that is later parsed by LightRAG.
My first question concern the format of the output : why this default particular default format (the one described in prompt.py)? Why not ask the LLM to output something like json or even xml which can be easily handled by machines and humans alike? Not to mention that LLM are trained on such formats (or language in the case of xml). I couldn't find any reason for this design choice running through the paper.
The follow-up question addresses the possibility of interacting with the insert task by providing rag.insert(...) formatted data rather than plain text (using a schema specified by LightRAG). This way, preparation of data could be handled outside LightRAG, allowing easier testing and better control (moreover, having an explicit schema for the extracted data would make it more comfortable to modify the LLM prompt). Is there an approach that already allows that kind of interaction?
Additional Context
No response