Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

Previous attempts and additional context here:

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life aclark4life requested review from a team, Copilot, timgraham and WaVEV July 25, 2025 20:08
Copilot

This comment was marked as resolved.

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added my first round of comments. I haven't looked over the tests yet but plan to do so next week. A big area I'll dig more into is around the discussion on KMS_PROVIDERS and KMS_CREDENTIALS

# Avoid using PyMongo to check the database version or require
# pymongocrypt>=1.14.2 which will contain a fix for the `buildInfo`
# command. https://jira.mongodb.org/browse/PYTHON-5429
return tuple(self.connection.admin.command("buildInfo")["versionArray"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat is depending on auth status, the admin and commands may not be available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would apply to the PyMongo API as well. This change only addresses the libmongocrypt issue that we have not worked around in PyMongo.

Comment on lines 26 to 34
def kms_provider(self, model, *args, **kwargs):
for router in self.routers:
func = getattr(router, "kms_provider", None)
if func and callable(func):
result = func(model, *args, **kwargs)
if result is not None:
return result
if getattr(model, "encrypted", False):
raise ImproperlyConfigured("No kms_provider found in database router.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this allows a KMS_PROVIDER to be defined at the Database Router level? Why not let it only be something done in the settings.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_encrypted_collection() takes the kms_provider argument, so this allows the provider to be selected per model, if needed. If that level of granularity isn't required, we can re-think the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we're already enforcing a separate database for encrypted collections, I think it's best to keep it at the granularity of the database. Then, a model would defer to the custom routing for the kms provider. It seems excessive to have multiple KMS providers just for a collection.

As well, since there's no way to compare encrypted keys across collections, having multiple databases in the configuration still feels like the right way to go.

If customers end up needing this fulfilled I can accept that, but even in that case, they could just have database level configurations.

Copy link
Collaborator Author

@aclark4life aclark4life Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I wouldn't say that this is per database router, but it is per database. It could be a Django setting, but why set a KMS_PROVIDER in Django when you only really need it for the encrypted database. The per-model flexibility is a function of database routers, not a function of KMS provider configuration and folks could take advantage of that ability for this feature or any feature in which a custom router is configured.

Comment on lines 90 to 94
# TODO: Add more encrypted fields
# - PositiveBigIntegerField
# - PositiveIntegerField
# - PositiveSmallIntegerField
# - SmallIntegerField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a JIRA to track this TODO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added everything but PositiveIntegerField which seems to have some issue with being a long in MongoDB.

Comment on lines 48 to 60
if model:
return db == ("other" if has_encrypted_fields(model) else "default")
return db == "default"

def db_for_read(self, model, **hints):
if has_encrypted_fields(model):
return "other"
return "default"

db_for_write = db_for_read

def kms_provider(self, model):
return "local"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be hard-coded to database name? Or is this just an example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is an example that should be moved to the docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above. The local provider is for local testing and the other database was because @WaVEV had previously used an other database for cache testing.

Comment on lines 631 to 621
self.connection.ensure_connection()
client = self.connection.connection.admin
build_info = client.command("buildInfo")
is_enterprise = "enterprise" in build_info.get("modules")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @ShaneHarvey is this guaranteed to capture enterprise connections?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @JamesKovacs confirmed this

Enterprise 07895885-9f55-4ee3-a362-4a06a321aefa [direct: secondary] test> db.adminCommand("buildInfo")
{
  version: '8.1.1',
  versionArray: [ 8, 1, 1, 0 ],
  gitVersion: 'c441b67d7260844c1422bf259e23c054a33ee7d8',
  modules: [ 'enterprise' ],
…

Comment on lines 26 to 34
def kms_provider(self, model, *args, **kwargs):
for router in self.routers:
func = getattr(router, "kms_provider", None)
if func and callable(func):
result = func(model, *args, **kwargs)
if result is not None:
return result
if getattr(model, "encrypted", False):
raise ImproperlyConfigured("No kms_provider found in database router.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_encrypted_collection() takes the kms_provider argument, so this allows the provider to be selected per model, if needed. If that level of granularity isn't required, we can re-think the API.

Comment on lines 48 to 60
if model:
return db == ("other" if has_encrypted_fields(model) else "default")
return db == "default"

def db_for_read(self, model, **hints):
if has_encrypted_fields(model):
return "other"
return "default"

db_for_write = db_for_read

def kms_provider(self, model):
return "local"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this is an example that should be moved to the docs.

Comment on lines 53 to 56
data_key = ce.create_data_key(
kms_provider=kms_provider,
master_key=master_key,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the design doc, where I was hoping you could say whether or not this is even needed. It's present in the Automatic Queryable Encryption example, but this doesn't follow that example since key_alt_names isn't present.

Comment on lines 17 to 22
If you plan to use :doc:`/topics/queryable-encryption/`, you will also need to install
the optional dependencies:

.. code-block:: bash
$ pip install --pre django-mongodb-backend[encryption]==5.2.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd omit this here. This won't be relevant to most users reading the introductory installation documentation and you correctly mentioned it in the encryption howto.

Comment on lines 106 to 113
Queryable Encryption
====================

Consider these
`limitations and restrictions <https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/limitations/>`_
before enabling Queryable Encryption. Some operations are unsupported, and others behave differently.

Also see :ref:`unsupported fields <encrypted-fields-unsupported-fields>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really this known-issues.rst page is limitations of Django's core features. I think we can keep limitations and problems with MongoDB specific features on their respective pages (e.g. there is some of that with EmbeddedModelField: schema changes not supported, etc.).

Comment on lines 293 to 302
# FIXME: Or remove if wontfix.
#
# This test fails due to
# pymongo.errors.OperationFailure: Index not allowed on, or a prefix
# of, the encrypted field slug
with self.assertRaises(AssertionError): # noqa: SIM117
with self.assertRaises(pymongo.errors.OperationFailure):

class SlugFieldTest(models.Model):
slug = EncryptedSlugField(EqualityQuery())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the issue is that SlugField has db_index=True. Slugs are generally used in URLs and it seems they would generally not be sensitive data that needs to be encrypted.

The limitation that encrypted fields can't be indexed seems a point worth documenting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they can't be indexed or if they are auto indexed. The docs say

Queryable Encryption does not support TTL Indexes or Unique Indexes.

Comment on lines 109 to 111
Consider these
`limitations and restrictions <https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/limitations/>`_
before enabling Queryable Encryption. Some operations are unsupported, and others behave differently.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for a translation of how those restrictions apply to Django querysets, etc. Quickly scanning that page, it reads as MongoDB mumbo jumbo that's meaningless to Djangonauts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial only demonstrates find_one and I've been using get in the tests. I'm not sure if filter and/or lookups are going to work.

@timgraham timgraham force-pushed the INTPYTHON-527 branch 3 times, most recently from 6e1b8ed to 3815a70 Compare August 2, 2025 22:41
@timgraham
Copy link
Collaborator

The encryption tests are passing locally for me on Enterprise and on the Atlas VM.

On GitHub actions, this first issue was solved by adding "directConnection": True in DATABASES:

  File "/home/runner/work/django-mongodb-backend/django-mongodb-backend/django_repo/django/db/backends/base/base.py", line 197, in check_database_version_supported
    and self.get_database_version() < self.features.minimum_database_version
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/django-mongodb-backend/django-mongodb-backend/django_mongodb_backend/base.py", line 235, in get_database_version
    return tuple(self.connection.admin.command("buildInfo")["versionArray"])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/database.py", line 926, in command
    with self._client._conn_for_reads(read_preference, session, operation=command_name) as (
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1864, in _conn_for_reads
    server = self._select_server(read_preference, session, operation)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1812, in _select_server
    server = topology.select_server(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 409, in select_server
    server = self._select_server(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 387, in _select_server
    servers = self.select_servers(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 294, in select_servers
    server_descriptions = self._select_servers_loop(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 344, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: Could not reach any servers in [('3ff0eef351ff', 27017)]. Replica set is configured with internal hostnames or IPs?, Timeout: 30s, Topology Description: <TopologyDescription id: 688e9488ca8c88d98365da45, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('3ff0eef351ff', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('3ff0eef351ff:27017: [Errno -3] Temporary failure in name resolution (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

But this issue remains:

Creating test database for alias 'encrypted' ('test_djangotests-encrypted')...
/home/runner/.local/lib/python3.12/site-packages/pymongo/daemon.py:147: RuntimeWarning: Failed to start mongocryptd: is it on your $PATH?
Original exception: [Errno 2] No such file or directory: 'mongocryptd'
  _silence_resource_warning(_spawn(sys.argv[1:]))
/home/runner/.local/lib/python3.12/site-packages/pymongo/daemon.py:147: RuntimeWarning: Failed to start mongocryptd: is it on your $PATH?
Original exception: [Errno 2] No such file or directory: 'mongocryptd'
  _silence_resource_warning(_spawn(sys.argv[1:]))
  Applying sites.0002_alter_domain_unique... OK
Operations to perform:
  Synchronize unmigrated apps: auth, contenttypes, encryption_, messages, sessions, staticfiles
  Apply all migrations: admin, sites
Synchronizing apps without migrations:
  Creating tables...
    Creating table encryption__appointment
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/database.py", line 926, in command
    with self._client._conn_for_reads(read_preference, session, operation=command_name) as (
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1864, in _conn_for_reads
    server = self._select_server(read_preference, session, operation)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/mongo_client.py", line 1812, in _select_server
    server = topology.select_server(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 409, in select_server
    server = self._select_server(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 387, in _select_server
    servers = self.select_servers(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 294, in select_servers
    server_descriptions = self._select_servers_loop(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/lib/python3.12/site-packages/pymongo/synchronous/topology.py", line 344, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:27020: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 10.0s, Topology Description: <TopologyDescription id: 688e9f76697ba6965a378048, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27020) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27020: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

self.assertEqual(
PatientRecord.objects.get(ssn="123-45-6789").profile_picture, b"image data"
)
with self.assertRaises(AssertionError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your thinking about the usefulness of this assertion? If assertEqual(profile_picture, b"image data") passed, then of course asserting it's not equal to something else is going to work? (Incidentally, assertNotEqual() is more natural than assertEqual() + assertRaises().)

More generally, it seems like you weren't sure exactly what to test here, so you wrote various things that came to mind. Maybe we need to define the test conditions so we can have some more standardized testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're relying on the encryption algorithm to detect changes I wanted to see it pass and fail, and yes the plan was to add fields to the patient-themed test suite, expanding on the tutorial example.

Comment on lines +381 to +383
# FIXME: pymongo.errors.EncryptionError: Cannot encrypt element of type int
# because schema requires that type is one of: [ long ]
# pos_int=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data type for each field is determined by this mapping:

data_types = {
"AutoField": "int",
"BigAutoField": "long",
"BinaryField": "binData",
"BooleanField": "bool",
"CharField": "string",
"DateField": "date",
"DateTimeField": "date",
"DecimalField": "decimal",
"DurationField": "long",
"FileField": "string",
"FilePathField": "string",
"FloatField": "double",
"IntegerField": "int",
"BigIntegerField": "long",
"GenericIPAddressField": "string",
"JSONField": "object",
"OneToOneField": "int",
"PositiveBigIntegerField": "int",
"PositiveIntegerField": "long",
"PositiveSmallIntegerField": "int",
"SlugField": "string",
"SmallAutoField": "int",
"SmallIntegerField": "int",
"TextField": "string",
"TimeField": "date",
"UUIDField": "string",
}

It seems the mapping has some mistakes. For example, PositiveBigIntegerField should be long (64-bit) [I think]. That said, this is an issue that should be correct in a separate PR.

The question remains how to send the value to the database as a long to avoid this error. Frankly, I wouldn't expect any special handling to be needed, but maybe Jib has some idea. (Was this already discussed when you ran into the error for DurationField?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has not been discussed and now that you mention it, DurationField may have been similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants