Skip to content

INTPYTHON-617 - Improve DictField to_python performance #2888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NoahStapp
Copy link

@NoahStapp NoahStapp commented Jun 24, 2025

Improves performance of large documents with nested DictFields by nearly 20x, addressing #1230.

Modified benchmark script pulled from https://stackoverflow.com/questions/35257305/mongoengine-is-very-slow-on-large-documents-compared-to-native-pymongo-usage/:

import datetime
import itertools
import random
import timeit
from collections import defaultdict

import mongoengine as db

db.connect("test-dicts")

class MyModel(db.Document):
    date = db.DateTimeField(required=True, default=datetime.date.today)
    data_dict_1 = db.DictField(required=False)

MyModel.drop_collection()

data_1 = ['foo', 'bar']
data_2 = ['spam', 'eggs', 'ham']
data_3 = ["subf{}".format(f) for f in range(5)]

m = MyModel()
tree = lambda: defaultdict(tree)
data = tree()
for _d1, _d2, _d3 in itertools.product(data_1, data_2, data_3):
    data[_d1][_d2][_d3] = list(random.sample(range(50000), 20000))
m.data_dict_1 = data
m.save()

def pymongo_doc():
    return db.connection.get_connection()["test-dicts"]['my_model'].find_one()

def mongoengine_doc():
    model = MyModel.objects.first()
    return model

if __name__ == '__main__':
    print("pymongo took {:2.2f}s".format(timeit.timeit(pymongo_doc, number=10)))
    print("mongoengine took {:2.2f}s".format(timeit.timeit(mongoengine_doc, number=10)))

Before:

pymongo took 0.21s
mongoengine took 4.98s

After:

pymongo took 0.20s
mongoengine took 0.20s

@terencehonles
Copy link
Contributor

Does your benchmark still apply? You seem to have changed the code substantially, and I would assume that the to_python caching you do at the top level not to be that impactful (but maybe it is). If this is really the case, then it may make sense to move your change down into the base class so this applies to lists too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants