-
Notifications
You must be signed in to change notification settings - Fork 26
INTPYTHON-527 Add Queryable Encryption support #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
2d672f2
to
10ae20e
Compare
4acc5ab
to
796855e
Compare
affa499
to
29a1d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added my first round of comments. I haven't looked over the tests yet but plan to do so next week. A big area I'll dig more into is around the discussion on KMS_PROVIDERS and KMS_CREDENTIALS
# Avoid using PyMongo to check the database version or require | ||
# pymongocrypt>=1.14.2 which will contain a fix for the `buildInfo` | ||
# command. https://jira.mongodb.org/browse/PYTHON-5429 | ||
return tuple(self.connection.admin.command("buildInfo")["versionArray"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One caveat is depending on auth status, the admin and commands may not be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would apply to the PyMongo
API as well. This change only addresses the libmongocrypt
issue that we have not worked around in PyMongo.
django_mongodb_backend/routers.py
Outdated
def kms_provider(self, model, *args, **kwargs): | ||
for router in self.routers: | ||
func = getattr(router, "kms_provider", None) | ||
if func and callable(func): | ||
result = func(model, *args, **kwargs) | ||
if result is not None: | ||
return result | ||
if getattr(model, "encrypted", False): | ||
raise ImproperlyConfigured("No kms_provider found in database router.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this allows a KMS_PROVIDER to be defined at the Database Router level? Why not let it only be something done in the settings.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_encrypted_collection()
takes the kms_provider
argument, so this allows the provider to be selected per model, if needed. If that level of granularity isn't required, we can re-think the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we're already enforcing a separate database for encrypted collections, I think it's best to keep it at the granularity of the database. Then, a model would defer to the custom routing for the kms provider. It seems excessive to have multiple KMS providers just for a collection.
As well, since there's no way to compare encrypted keys across collections, having multiple databases in the configuration still feels like the right way to go.
If customers end up needing this fulfilled I can accept that, but even in that case, they could just have database level configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I wouldn't say that this is per database router, but it is per database. It could be a Django setting, but why set a KMS_PROVIDER
in Django when you only really need it for the encrypted database. The per-model flexibility is a function of database routers, not a function of KMS provider configuration and folks could take advantage of that ability for this feature or any feature in which a custom router is configured.
# TODO: Add more encrypted fields | ||
# - PositiveBigIntegerField | ||
# - PositiveIntegerField | ||
# - PositiveSmallIntegerField | ||
# - SmallIntegerField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a JIRA to track this TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added everything but PositiveIntegerField
which seems to have some issue with being a long
in MongoDB.
django_mongodb_backend/encryption.py
Outdated
if model: | ||
return db == ("other" if has_encrypted_fields(model) else "default") | ||
return db == "default" | ||
|
||
def db_for_read(self, model, **hints): | ||
if has_encrypted_fields(model): | ||
return "other" | ||
return "default" | ||
|
||
db_for_write = db_for_read | ||
|
||
def kms_provider(self, model): | ||
return "local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be hard-coded to database name? Or is this just an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this is an example that should be moved to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above. The local
provider is for local testing and the other
database was because @WaVEV had previously used an other
database for cache testing.
django_mongodb_backend/features.py
Outdated
self.connection.ensure_connection() | ||
client = self.connection.connection.admin | ||
build_info = client.command("buildInfo") | ||
is_enterprise = "enterprise" in build_info.get("modules") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @ShaneHarvey is this guaranteed to capture enterprise connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @JamesKovacs confirmed this
Enterprise 07895885-9f55-4ee3-a362-4a06a321aefa [direct: secondary] test> db.adminCommand("buildInfo")
{
version: '8.1.1',
versionArray: [ 8, 1, 1, 0 ],
gitVersion: 'c441b67d7260844c1422bf259e23c054a33ee7d8',
modules: [ 'enterprise' ],
…
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/routers.py
Outdated
def kms_provider(self, model, *args, **kwargs): | ||
for router in self.routers: | ||
func = getattr(router, "kms_provider", None) | ||
if func and callable(func): | ||
result = func(model, *args, **kwargs) | ||
if result is not None: | ||
return result | ||
if getattr(model, "encrypted", False): | ||
raise ImproperlyConfigured("No kms_provider found in database router.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_encrypted_collection()
takes the kms_provider
argument, so this allows the provider to be selected per model, if needed. If that level of granularity isn't required, we can re-think the API.
django_mongodb_backend/encryption.py
Outdated
if model: | ||
return db == ("other" if has_encrypted_fields(model) else "default") | ||
return db == "default" | ||
|
||
def db_for_read(self, model, **hints): | ||
if has_encrypted_fields(model): | ||
return "other" | ||
return "default" | ||
|
||
db_for_write = db_for_read | ||
|
||
def kms_provider(self, model): | ||
return "local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this is an example that should be moved to the docs.
data_key = ce.create_data_key( | ||
kms_provider=kms_provider, | ||
master_key=master_key, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on the design doc, where I was hoping you could say whether or not this is even needed. It's present in the Automatic Queryable Encryption example, but this doesn't follow that example since key_alt_names
isn't present.
89205af
to
293656a
Compare
docs/source/intro/install.rst
Outdated
If you plan to use :doc:`/topics/queryable-encryption/`, you will also need to install | ||
the optional dependencies: | ||
|
||
.. code-block:: bash | ||
$ pip install --pre django-mongodb-backend[encryption]==5.2.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit this here. This won't be relevant to most users reading the introductory installation documentation and you correctly mentioned it in the encryption howto.
docs/source/topics/known-issues.rst
Outdated
Queryable Encryption | ||
==================== | ||
|
||
Consider these | ||
`limitations and restrictions <https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/limitations/>`_ | ||
before enabling Queryable Encryption. Some operations are unsupported, and others behave differently. | ||
|
||
Also see :ref:`unsupported fields <encrypted-fields-unsupported-fields>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really this known-issues.rst page is limitations of Django's core features. I think we can keep limitations and problems with MongoDB specific features on their respective pages (e.g. there is some of that with EmbeddedModelField: schema changes not supported, etc.).
tests/encryption_/tests.py
Outdated
# FIXME: Or remove if wontfix. | ||
# | ||
# This test fails due to | ||
# pymongo.errors.OperationFailure: Index not allowed on, or a prefix | ||
# of, the encrypted field slug | ||
with self.assertRaises(AssertionError): # noqa: SIM117 | ||
with self.assertRaises(pymongo.errors.OperationFailure): | ||
|
||
class SlugFieldTest(models.Model): | ||
slug = EncryptedSlugField(EqualityQuery()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the issue is that SlugField
has db_index=True
. Slugs are generally used in URLs and it seems they would generally not be sensitive data that needs to be encrypted.
The limitation that encrypted fields can't be indexed seems a point worth documenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they can't be indexed or if they are auto indexed. The docs say
Queryable Encryption does not support TTL Indexes or Unique Indexes.
docs/source/topics/known-issues.rst
Outdated
Consider these | ||
`limitations and restrictions <https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/limitations/>`_ | ||
before enabling Queryable Encryption. Some operations are unsupported, and others behave differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping for a translation of how those restrictions apply to Django querysets, etc. Quickly scanning that page, it reads as MongoDB mumbo jumbo that's meaningless to Djangonauts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tutorial only demonstrates find_one
and I've been using get
in the tests. I'm not sure if filter
and/or lookups are going to work.
270727c
to
f0d4e92
Compare
f0d4e92
to
4ef6b84
Compare
6e1b8ed
to
3815a70
Compare
3815a70
to
a6a12e7
Compare
The encryption tests are passing locally for me on Enterprise and on the Atlas VM. On GitHub actions, this first issue was solved by adding
But this issue remains:
|
self.assertEqual( | ||
PatientRecord.objects.get(ssn="123-45-6789").profile_picture, b"image data" | ||
) | ||
with self.assertRaises(AssertionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thinking about the usefulness of this assertion? If assertEqual(profile_picture, b"image data")
passed, then of course asserting it's not equal to something else is going to work? (Incidentally, assertNotEqual()
is more natural than assertEqual() + assertRaises()
.)
More generally, it seems like you weren't sure exactly what to test here, so you wrote various things that came to mind. Maybe we need to define the test conditions so we can have some more standardized testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're relying on the encryption algorithm to detect changes I wanted to see it pass and fail, and yes the plan was to add fields to the patient-themed test suite, expanding on the tutorial example.
# FIXME: pymongo.errors.EncryptionError: Cannot encrypt element of type int | ||
# because schema requires that type is one of: [ long ] | ||
# pos_int=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data type for each field is determined by this mapping:
django-mongodb-backend/django_mongodb_backend/base.py
Lines 36 to 63 in d5aa1a7
data_types = { | |
"AutoField": "int", | |
"BigAutoField": "long", | |
"BinaryField": "binData", | |
"BooleanField": "bool", | |
"CharField": "string", | |
"DateField": "date", | |
"DateTimeField": "date", | |
"DecimalField": "decimal", | |
"DurationField": "long", | |
"FileField": "string", | |
"FilePathField": "string", | |
"FloatField": "double", | |
"IntegerField": "int", | |
"BigIntegerField": "long", | |
"GenericIPAddressField": "string", | |
"JSONField": "object", | |
"OneToOneField": "int", | |
"PositiveBigIntegerField": "int", | |
"PositiveIntegerField": "long", | |
"PositiveSmallIntegerField": "int", | |
"SlugField": "string", | |
"SmallAutoField": "int", | |
"SmallIntegerField": "int", | |
"TextField": "string", | |
"TimeField": "date", | |
"UUIDField": "string", | |
} |
It seems the mapping has some mistakes. For example, PositiveBigIntegerField
should be long
(64-bit) [I think]. That said, this is an issue that should be correct in a separate PR.
The question remains how to send the value to the database as a long to avoid this error. Frankly, I wouldn't expect any special handling to be needed, but maybe Jib has some idea. (Was this already discussed when you ran into the error for DurationField?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has not been discussed and now that you mention it, DurationField
may have been similar.
Previous attempts and additional context here: