Skip to content

(EAI-1230): Create MDB collection and indexes for Atlas Search benchmark #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 12, 2025

Conversation

mongodben
Copy link
Collaborator

@mongodben mongodben commented Aug 6, 2025

Jira: https://jira.mongodb.org/browse/EAI-1230

Changes

  • Script to load dataset from HF and push to MongoDB Atlas
  • Add Atlas Search indexes to collection
  • Add README to datasets package explaining

Notes

  • Did this code in Python b/c there's not a clean way to work with HF datasets from Node.js (which is quite bizarre and annoying)

@@ -0,0 +1,3 @@
"""MongoDB Datasets - Utilities for importing datasets into MongoDB."""

__version__ = "0.1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can we add newlines to the end of each file (and ideally track down whatever setting is stripping them out)

Comment on lines 7 to 9
# Structure: packages/datasets/mongodb_datasets/config.py -> ../../../.env
PROJECT_ROOT = Path(__file__).parent.parent.parent.parent.parent
ENV_PATH = PROJECT_ROOT / ".env"
Copy link
Collaborator

@nlarew nlarew Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works but it's a bit confusing to me:

  • The example (i.e. three levels of .. from config.py) would resolve to packages/.env which doesn't exist
  • The actual code calls parent five times which doesn't match the example and seems to me like it would be the directory about the root chatbot repo.

@mongodben mongodben merged commit 80f0e46 into main Aug 12, 2025
2 checks passed
@mongodben mongodben deleted the EAI-1230 branch August 12, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants