Skip to content

Commit 6249fb3

Browse files
committed
MAINT: Document and test XMP security
1 parent b9f66ab commit 6249fb3

File tree

3 files changed

+72
-4
lines changed

3 files changed

+72
-4
lines changed

.github/SECURITY.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@ automatically inform all relevant team members. Otherwise, please
1212
get in touch with stefan6419846 through e-mail (current maintainer,
1313
address in GitHub profile).
1414

15+
Please have a look at our [corresponding user documentation](https://pypdf.readthedocs.io/en/stable/user/security.html)
16+
as well, which includes some information about possibly invalid reports as well.
17+
1518
We will try to find a fix in a timely manner and will then issue a security
16-
advisory together with the update via GitHub
19+
advisory together with the update via GitHub, as well as requesting a CVE
1720
([example](https://github.com/py-pdf/pypdf/security/advisories/GHSA-xcjx-m2pj-8g79)).
1821

19-
If you don't get a reaction within 30 days, please open a public issue on
20-
GitHub.
22+
If you do not get a reaction within 30 days, please open a public issue on GitHub.

docs/user/security.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,15 @@ We receive reports about possibly insecure cryptography from time to time. This
5656

5757
These are requirements of the PDF standard, which we need to achieve the greatest compatibility with.
5858
Although some of them might be deprecated in PDF 2.0, the PDF 2.0 adoption rate is very low and legacy documents need to be supported.
59+
60+
### XML parsing
61+
62+
We use `xml.minidom` for parsing XMP information. Given recent Python versions built against recent Expat versions, the usual attacks
63+
(exponential entity expansion and external entity expansion) should not be possible. We have corresponding tests in place to ensure
64+
this for the platforms our tests run against.
65+
66+
For some details, see [the official documentation](https://docs.python.org/3/library/xml.html#xml-security) and the
67+
[README for defusedxml](https://github.com/tiran/defusedxml/blob/main/README.md#python-xml-libraries).
68+
69+
Please note that automated scanners tend to still flag any direct imports of XML modules from the Python standard library as unsafe.
70+
There have been discussions about this being outdated already, but they are still being flagged.

tests/test_xmp.py

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
import pypdf.xmp
99
from pypdf import PdfReader, PdfWriter
1010
from pypdf.errors import PdfReadError, XmpDocumentError
11-
from pypdf.generic import NameObject, StreamObject
11+
from pypdf.generic import ContentStream, NameObject, StreamObject
1212
from pypdf.xmp import XmpInformation
1313

1414
from . import RESOURCE_ROOT, SAMPLE_ROOT, get_data_from_url
@@ -887,3 +887,57 @@ def test_xmp_information__create_and_set_metadata():
887887
assert xmp.dc_contributor == ["test1"]
888888
assert xmp.dc_creator == ["test2"]
889889
assert xmp.dc_title == {"x-default": "test3"}
890+
891+
892+
def test_xmp_information__external_entity_expansion(tmpdir):
893+
path = tmpdir / "secret.txt"
894+
path.write("VERY SECRET")
895+
896+
stream = ContentStream(pdf=None, stream=None)
897+
stream.set_data(f"""<?xml version="1.0"?>
898+
<!DOCTYPE foo [
899+
<!ENTITY xxe SYSTEM "file://{path}">
900+
]>
901+
<x:xmpmeta xmlns:x="adobe:ns:meta/">
902+
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
903+
<rdf:Description rdf:about="">
904+
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">&xxe;abc</dc:creator>
905+
</rdf:Description>
906+
</rdf:RDF>
907+
</x:xmpmeta>""".encode())
908+
909+
xmp = XmpInformation(stream)
910+
assert xmp.dc_creator == ["abc"]
911+
912+
913+
@pytest.mark.timeout(10)
914+
def test_xmp_information__exponential_entity_expansion():
915+
stream = ContentStream(pdf=None, stream=None)
916+
stream.set_data(b"""<?xml version="1.0"?>
917+
<!DOCTYPE lolz [
918+
<!ENTITY lol "lol">
919+
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
920+
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
921+
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
922+
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
923+
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
924+
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
925+
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
926+
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
927+
]>
928+
<x:xmpmeta xmlns:x="adobe:ns:meta/">
929+
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
930+
<rdf:Description rdf:about="">
931+
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">&lol9;</dc:title>
932+
</rdf:Description>
933+
</rdf:RDF>
934+
</x:xmpmeta>""")
935+
936+
with pytest.raises(
937+
expected_exception=PdfReadError,
938+
match=(
939+
r"^XML in XmpInformation was invalid: limit on input amplification factor "
940+
r"\(from DTD and entities\) breached: line 16, column 60$"
941+
)
942+
):
943+
XmpInformation(stream)

0 commit comments

Comments
 (0)