Support text, JSON, XML and YAML `DocumentUrl` and `BinaryContent` on OpenAI #2851

pulphix · 2025-09-10T13:03:32Z

Problem addressed

Closes Using BinaryContent with CSV media type throws an error #1161

OpenAI rejects text/plain as file parts; Anthropic/Gemini accept document URLs directly.
This created inconsistent DX.

New types

MagicDocumentUrl: subclass of DocumentUrl, adds optional filename and a magic marker.
MagicBinaryContent: subclass of BinaryContent, adds optional filename and a magic marker.

Both preserve their original types in serialized history so users can filter for them.

OpenAI-specific handling

If media_type == text/plain:
- MagicDocumentUrl → downloaded as UTF-8 and converted to a single text UserContent with a clear delimiter:
```
-----BEGIN FILE filename="<name>" type="text/plain"-----
<file contents>
-----END FILE-----
```
- MagicBinaryContent → decoded as UTF-8 and converted to the same text format.
Non-text (PDF, images, etc.) → sent as OpenAI file parts (base64 + strict MIME + filename) like before.

Other providers

Anthropic/Gemini: Magic* are effectively pass-through (treated like their base classes), so PDFs/text URLs keep working without special casing.

Serialization/history

We keep the Magic* classes in the message history (with an is_magic marker) so users can filter by type even if OpenAI saw inline text at request time.

Tests

Added tests ensuring:

MagicBinaryContent (text/plain) → inline text with delimiter on OpenAI.
MagicBinaryContent (PDF) → file part on OpenAI.
MagicDocumentUrl (text/plain) → inline text with delimiter on OpenAI (mocked download).
MagicDocumentUrl (PDF) → file part on OpenAI (mocked download).

All functional tests pass; repo’s global 100% coverage target remains managed outside our changes.

Example

examples/pydantic_ai_examples/magic_files.py demonstrates both MagicDocumentUrl and MagicBinaryContent with OpenAI and Anthropic.
Loads API keys via python-dotenv if available (load_dotenv()).

How to use

Import:

from pydantic_ai import Agent, MagicDocumentUrl, MagicBinaryContent

DouweM · 2025-09-10T14:33:28Z

@pulphix Thanks Fabio, I'll hold off on reviewing this pending #1161 (comment).

DouweM · 2025-09-11T17:44:44Z

@pulphix I discussed with @Kludex, and we'd the automatic behavior for text/* MIME type BinaryContents and DocumentUrls to be to inline them as text parts with HTTP multipart-style fencing. Can you please update the PR to remove the magic subclasses?

pulphix · 2025-09-12T08:47:11Z

@DouweM @Kludex I updated the PR by removing the Magic Classes and porting the logic to BinaryContent and DocumentUrl.

I’m wondering whether it makes sense to include the file name in the BinaryContent prompt.
For example:

-----BEGIN FILE filename="file.xsl" type="application/xml"-----\n<a>1</a>\n-----END FILE-----

If multiple text files are uploaded as binary, they will all share the same name, which might create issues with LLM reasoning.
One option could be to use the identifier to make the file name unique.

pulphix · 2025-09-12T09:20:15Z

Proposal

Add an optional filename field to both BinaryContent and DocumentUrl:

For BinaryContent, this allows specifying the original filename when encoding a file to base64
For DocumentUrl, this ensures that when a URL is resolved into a file, the adapter can preserve or supply a human-readable name.

pulphix · 2025-09-15T07:47:40Z

Removed reference to filename and added URL for DocumentUrl.

examples/pydantic_ai_examples/textlike_file_mapping.py

pydantic_ai_slim/pydantic_ai/__init__.py

pydantic_ai_slim/pydantic_ai/models/openai.py

tests/models/cassettes/test_model_names/test_known_model_names.yaml

tests/models/test_openai_textlike_mapping.py

pulphix · 2025-09-16T16:25:56Z

@DouweM I rebased from the main branch, but I encountered some failures.

…icBinaryContent

… DocumentUrl

pydantic_ai_slim/pydantic_ai/models/openai.py

DouweM · 2025-09-17T19:57:34Z

pydantic_ai_slim/pydantic_ai/models/openai.py

+    @staticmethod
+    def _inline_file_block(media_type: str, text: str, identifier: str | None) -> str:
+        id_attr = f' id="{identifier}"' if identifier else ''
+        return ''.join(['-----BEGIN FILE', id_attr, ' type="', media_type, '"-----\n', text, '\n-----END FILE-----'])


Suggested change

return ''.join(['-----BEGIN FILE', id_attr, ' type="', media_type, '"-----\n', text, '\n-----END FILE-----'])

return '\n'.join([

f'-----BEGIN FILE{id_attr} type="{media_type}"-----',

text,

f'-----END FILE {id_attr}-----',

])

DouweM · 2025-09-17T19:57:57Z

pydantic_ai_slim/pydantic_ai/models/openai.py

+
+    @staticmethod
+    def _inline_file_block(media_type: str, text: str, identifier: str | None) -> str:
+        id_attr = f' id="{identifier}"' if identifier else ''


Would there always be an identifier?

@DouweM based on the DocumentUrl object identifier can be None

Hmm, I agree the type states it can be, but the implementation ensures it always has a value, as far as I can see. Can we add an assert identifier is not None in here, as there's not much use in giving the model an inline text part if there is no way to identify it?

DouweM · 2025-09-17T20:00:57Z

tests/models/test_openai.py

+
+    model = OpenAIChatModel('gpt-4o', provider=OpenAIProvider(api_key=openai_api_key))
+    # Monkeypatch the client's create method
+    model.client.chat.completions.create = fake_create


In this particular case I think it's acceptable to directly call the private _map_user_message method (and add a # type: ignore[reportPrivateUsage] comment) as it's a lot easier to follow than this mocking.

DouweM · 2025-09-17T20:01:28Z

tests/models/test_openai.py

+    assert parts[0]['type'] == 'text'
+    text = parts[0]['text']
+    assert text.startswith(f'-----BEGIN FILE id="{identifier}" type="{media_type}"-----')
+    assert text.rstrip().endswith('-----END FILE-----')


Can we do a direct assert text == so that we can verify the newlines etc?

DouweM · 2025-09-17T20:02:35Z

pydantic_ai_slim/pydantic_ai/models/openai.py

-                    assert_never(item)
-        return chat.ChatCompletionUserMessageParam(role='user', content=content)
+            # Fallback: unknown type — return empty parts to avoid type-checker Never error
+            return []


I'd rather replace object with a more specific type hint, so we can use assert_never(item) here

DouweM · 2025-09-17T20:03:49Z

tests/models/test_openai.py

+        )
+
+
+async def test_openai_map_single_item_unknown_returns_empty_branch(


I'm not sure why we need this test and the next one. What lines would be uncovered without them?

Changed name of the test

…enai

pulphix · 2025-09-24T16:10:03Z

@DouweM Updated PR based on requests.

…enai

pydantic_ai_slim/pydantic_ai/models/openai.py

DouweM · 2025-09-30T00:01:18Z

pydantic_ai_slim/pydantic_ai/models/openai.py

+
+    @staticmethod
+    def _inline_file_block(media_type: str, text: str, identifier: str | None) -> str:
+        id_attr = f' id="{identifier}"' if identifier else ''


Hmm, I agree the type states it can be, but the implementation ensures it always has a value, as far as I can see. Can we add an assert identifier is not None in here, as there's not much use in giving the model an inline text part if there is no way to identify it?

tests/models/test_openai.py

…thub.com:pulphix/pydantic-ai into pulphix/implemented_text_file_support_for_openai

…implemented_text_file_support_for_openai

…enai

DouweM · 2025-09-30T21:48:04Z

@pulphix Thanks for your work on this Fabio! I made a couple more code organization tweaks and will get this out in a release today.

pulphix mentioned this pull request Sep 10, 2025

Using BinaryContent with CSV media type throws an error #1161

Closed

1 task

DouweM self-assigned this Sep 10, 2025

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch from 0125d8a to 2ff0bfd Compare September 11, 2025 08:19

DouweM added the awaiting author revision label Sep 11, 2025

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch from 9883cd8 to affe38a Compare September 12, 2025 08:29

pulphix changed the title ~~Added support for text file for OpenAI using MagicDocumentUrl and MagicBinaryContent~~ Added support for text file for OpenAI on DocumentUrl and BinaryContent Sep 12, 2025

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch 2 times, most recently from 414cc86 to 83b8e15 Compare September 15, 2025 07:04

DouweM requested changes Sep 15, 2025

View reviewed changes

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch 2 times, most recently from 476cd9d to f7ef1ae Compare September 16, 2025 12:01

pulphix requested a review from DouweM September 16, 2025 12:59

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch from 0e3214d to 9885418 Compare September 16, 2025 16:05

pulphix added 11 commits September 16, 2025 20:43

Added support for text file for OpenAI using MagicDocumentUrl and Mag…

5793b44

…icBinaryContent

Updated tests to fix Pyright errors

e60161e

Fixed failing checks

c7d258c

Fixed Errors on check pre commit

544ae22

Fixed tests file based on pyright feedback

63edc4a

Added 2 tests to cover Images with Magic Classes

e98df12

Fixed Pylint error

8b00447

Fixed missing space on import

cf45451

Removed pragma no cover on VideoUrl

f245f50

Removed Magic Classes logic implemented directly on BinaryContent and…

4780ec2

… DocumentUrl

Removed filename and added url only for DocumentUrl

69a322a

Added # type: ignore[reportPrivateUsage]

fb63178

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch from 9885418 to 5b60fc0 Compare September 16, 2025 18:43

Reverted file

c0c5775

pulphix force-pushed the pulphix/implemented_text_file_support_for_openai branch from 5b60fc0 to c0c5775 Compare September 17, 2025 09:29

DouweM requested changes Sep 17, 2025

View reviewed changes

pulphix added 3 commits September 24, 2025 16:52

Merge branch 'main' into pulphix/implemented_text_file_support_for_op…

25f0286

…enai

Updated code based on review requests

bdbc30c

Updated _inline_file_block based on feedback

cbe96ef

pulphix requested a review from DouweM September 25, 2025 15:15

Merge branch 'main' into pulphix/implemented_text_file_support_for_op…

a471dec

…enai

DouweM requested changes Sep 30, 2025

View reviewed changes

pulphix added 11 commits September 30, 2025 16:28

Updated based on requests

b01f4fc

Merge branch 'pulphix/implemented_text_file_support_for_openai' of gi…

5fa678a

…thub.com:pulphix/pydantic-ai into pulphix/implemented_text_file_support_for_openai

Merge commit 'f5602ddb4a7aa86b5c3e5a3ce53ae5ad7b2efc9a' into pulphix/…

f248dc7

…implemented_text_file_support_for_openai

Merge commit 'f5602ddb4a7aa86b5c3e5a3ce53ae5ad7b2efc9a' into pulphix/…

c9d1be2

…implemented_text_file_support_for_openai

Added assert identifier is not None

3d7464a

Added test_openai_map_user_prompt_video_url_raises_not_implemented

b6f0257

Added test_openai_map_user_prompt_video_url_raises_not_implemented

cd6458d

Fixed pylint error

45bdd64

Added test with updated cassette

d429f6c

Updated Dummy text file

8e85d92

Merge branch 'main' into pulphix/implemented_text_file_support_for_op…

224458e

…enai

pulphix requested a review from DouweM September 30, 2025 20:02

pulphix added 2 commits September 30, 2025 22:02

Merge branch 'main' into pulphix/implemented_text_file_support_for_op…

d9d44f7

…enai

Merge branch 'main' into pulphix/implemented_text_file_support_for_op…

afede9d

…enai

DouweM changed the title ~~Added support for text file for OpenAI on DocumentUrl and BinaryContent~~ Support text, JSON, XML and YAML DocumentUrl and BinaryContent on OpenAI Sep 30, 2025

Clean up

85c69bb

DouweM merged commit 5287abf into pydantic:main Sep 30, 2025
30 checks passed

-        return ''.join(['-----BEGIN FILE', id_attr, ' type="', media_type, '"-----\n', text, '\n-----END FILE-----'])
+        return '\n'.join([
+            f'-----BEGIN FILE{id_attr} type="{media_type}"-----',
+            text,
+            f'-----END FILE {id_attr}-----',
+        ])

		)


		async def test_openai_map_single_item_unknown_returns_empty_branch(

Support text, JSON, XML and YAML DocumentUrl and BinaryContent on OpenAI #2851

Support text, JSON, XML and YAML DocumentUrl and BinaryContent on OpenAI #2851

Uh oh!

Conversation

pulphix commented Sep 10, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem addressed

New types

OpenAI-specific handling

Other providers

Serialization/history

Tests

Example

How to use

Uh oh!

DouweM commented Sep 10, 2025

Uh oh!

DouweM commented Sep 11, 2025

Uh oh!

pulphix commented Sep 12, 2025

Uh oh!

pulphix commented Sep 12, 2025

Proposal

Uh oh!

pulphix commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pulphix commented Sep 16, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pulphix commented Sep 24, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DouweM commented Sep 30, 2025

Uh oh!

Uh oh!

Support text, JSON, XML and YAML `DocumentUrl` and `BinaryContent` on OpenAI #2851

Support text, JSON, XML and YAML `DocumentUrl` and `BinaryContent` on OpenAI #2851

pulphix commented Sep 10, 2025 •

edited by DouweM

Loading