Skip to content

Conversation

@letterbeezps
Copy link

@letterbeezps letterbeezps commented Sep 8, 2025

What problem does this PR solve?

support use baidu vector database as doc engine

Type of change

  • New Feature (non-breaking change which adds functionality)

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 💞 feature Feature request, pull request that fullfill a new feature. labels Sep 8, 2025
@letterbeezps
Copy link
Author

@KevinHuSh Hello, do you have time to review this code? BaiduVDB now supports field weight assignments.

@yingfeng
Copy link
Member

@letterbeezps Thanks for your contribution.
There are some guidelines to integrate new doc engine into RAGFlow:

  1. Make sure the new doc engine can fully support the requirements of RAGFlow.
    From your implementation, it seems the term weight has not been applied,yet. Can you please further integrate these features from baidu VDB?
  2. Can you continuously maintain the baidu VDB as the doc engine according to RAGFlow's evolution? We don't have corresponding resources to fully test each doc engine, without continuous update, the doc engine would have to be abandoned due to being lack of new features.

After these two issues can be resolved, we will merge it, thank you so much!

raise Exception(f"Mapping file not found at {fp_mapping}")
self.mapping = json.load(open(fp_mapping))
healthy = self.health()
self.query_fields_boosts = {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingfeng the term weight is applied at here, This syntax is a little different from es, it acts directly on the query value rather than the field

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field weight instead of term weight?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a little confused. Can you point out where the term weight is used in "es_conn.py"! so I can check and compare. @KevinHuSh

@letterbeezps
Copy link
Author

For the promotion of Baidu VDB,our team will maintain the baidu VDB as the doc engine for RAGFlow's new features

if syns and len(keywords) < 32:
keywords.extend(syns)
logging.debug(json.dumps(twts, ensure_ascii=False))
for tk, w in sorted(twts, key=lambda x: x[1] * -1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The w is the weight of the term, which has been dropped/neglected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, I'll check it out. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💞 feature Feature request, pull request that fullfill a new feature. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants