BUG FIX: to_json() with JSON Table Schema work correctly with string dtype. #61900

khemkaran10 · 2025-07-18T17:23:32Z

To ensure consistent behavior for to_json(), when dtype="str" is used, it will now output "type": "string" instead of "type": "any".

Before Fix:

>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

After Fix:

>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

closes BUG: make to_json with JSON Table Schema work correctly with string dtype #61889
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/io/json/_table_schema.py

jorisvandenbossche · 2025-07-21T19:47:40Z

pandas/io/json/_table_schema.py

@@ -197,7 +195,7 @@ def convert_json_field_to_pandas_type(field) -> str | CategoricalDtype:
    """
    typ = field["type"]
    if typ == "string":
-        return "object"
+        return field.get("extDtype", "object")


Suggested change

return field.get("extDtype", "object")

return field.get("extDtype", "object")

Should that now default to "str" instead of "object"?

yes, while converting back to pandas from JSON, extDtype will be missing for object type, so it will return object. For the str case, extDtype will be equal to str.

Suggested change

return field.get("extDtype", "object")

return field.get("extDtype", None)

As another alternative: keep the dtype as None here, which I think will mean in practice that we keep the inferred dtype from the construction.

Because right now (because of the above logic to use object for "string" type), data written by older pandas will not have the extDtype, and so will convert string columns to object when reading:

>>> pd.options.future.infer_string = False >>> df = pd.DataFrame({"col_str": ["a", "b", "c", None], "col_object": ["str", 1, 1.5, None]}) >>> output = df.to_json(orient="table") >>> output '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"col_str","type":"string"},{"name":"col_object","type":"string"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"col_str":"a","col_object":"str"},{"index":1,"col_str":"b","col_object":1},{"index":2,"col_str":"c","col_object":1.5},{"index":3,"col_str":null,"col_object":null}]}' >>> pd.options.future.infer_string = True >>> df2 = pd.read_json(StringIO(output), orient="table") >>> df2 col_str col_object 0 a str 1 b 1 2 c 1.5 3 NaN NaN >>> df2.dtypes col_str object col_object object dtype: object

while ideally the above should have df2 to have str dtype for the col_str column.

jorisvandenbossche · 2025-07-21T19:53:34Z

pandas/tests/io/json/test_json_table_schema.py

@@ -70,7 +70,7 @@ def test_build_table_schema(self, df_schema, using_infer_string):
            "primaryKey": ["idx"],
        }
        if using_infer_string:
-            expected["fields"][2] = {"name": "B", "type": "any", "extDtype": "str"}
+            expected["fields"][2] = {"name": "B", "type": "string", "extDtype": "str"}


I am wondering if we should still include the "extDtype": "str" here for the default string dtype. While having this extDtype for actual (opt-in) extension dtypes is very useful to support proper roundtripping, I am not sure it is useful for a default data type.

yes, I first removed it, but then converting back to pandas caused issues. I was not able to distinguish between object and str dtype because both had the exact same JSON.

Ah, that is a good point.
I am wondering for newly written data, if we shouldn't use "type": "any" for true object dtype, so we can make that distinction that way.

Khemkaran and others added 2 commits July 18, 2025 22:21

to_json() with JSON table schema type fix

2ad36c9

Merge branch 'main' into issue_61889

359f74e

khemkaran10 mentioned this pull request Jul 21, 2025

BUG: make to_json with JSON Table Schema work correctly with string dtype #61889

Open

mroeschke added IO JSON read_json, to_json, json_normalize Strings String extension data type and string data labels Jul 21, 2025

mroeschke reviewed Jul 21, 2025

View reviewed changes

pandas/io/json/_table_schema.py Outdated Show resolved Hide resolved

khemkaran10 and others added 2 commits July 22, 2025 00:08

Merge branch 'main' into issue_61889

164e716

removed redundant check in as_json_table_type

a048e69

jorisvandenbossche reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG FIX: to_json() with JSON Table Schema work correctly with string dtype. #61900

BUG FIX: to_json() with JSON Table Schema work correctly with string dtype. #61900

khemkaran10 commented Jul 18, 2025

Uh oh!

Uh oh!

jorisvandenbossche Jul 21, 2025

Uh oh!

khemkaran10 Jul 21, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche Jul 22, 2025

Uh oh!

jorisvandenbossche Jul 21, 2025

Uh oh!

khemkaran10 Jul 21, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche Jul 22, 2025

Uh oh!

Uh oh!

	return field.get("extDtype", "object")
	return field.get("extDtype", "object")

	return field.get("extDtype", "object")
	return field.get("extDtype", None)

Uh oh!

BUG FIX: to_json() with JSON Table Schema work correctly with string dtype. #61900

Are you sure you want to change the base?

BUG FIX: to_json() with JSON Table Schema work correctly with string dtype. #61900

Conversation

khemkaran10 commented Jul 18, 2025

Uh oh!

Uh oh!

jorisvandenbossche Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

khemkaran10 Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

khemkaran10 Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

khemkaran10 Jul 21, 2025 •

edited

Loading

khemkaran10 Jul 21, 2025 •

edited

Loading