BUG: fix padding for string categories in CategoricalIndex repr #61894

jorisvandenbossche · 2025-07-17T19:10:04Z

Resolving some xfails: getting back the same padding as we had before.

On current main with string dtype:

>>> pd.CategoricalIndex(["a", "bb", "ccc"] * 10)
CategoricalIndex([  'a',  'bb', 'ccc',   'a',  'bb', 'ccc',   'a',  'bb',
                  'ccc',   'a',  'bb', 'ccc',   'a',  'bb', 'ccc',   'a',
                   'bb', 'ccc',   'a',  'bb', 'ccc',   'a',  'bb', 'ccc',
                    'a',  'bb', 'ccc',   'a',  'bb', 'ccc'],
                 categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')

With this PR and what it looks like with object dtype:

>>> pd.CategoricalIndex(["a", "bb", "ccc"] * 10)
CategoricalIndex(['a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a',
                  'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb',
                  'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc'],
                 categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')

jbrockmendel · 2025-07-17T21:18:07Z

On second look, I retract my claim that the old padding is nicer. No preference.

jorisvandenbossche · 2025-07-18T06:44:48Z

I think the non-aligned version (so how it was before / is with object dtype) is better, especially for cases where your categories have different length. The example here only has 1 vs 3 characters, but for example:

# on main with str dtype / without this PR
>>> pd.CategoricalIndex(["low", "intermediate", "high", "low"] * 10)
CategoricalIndex([         'low', 'intermediate',         'high',
                           'low',          'low', 'intermediate',
                          'high',          'low',          'low',
                  'intermediate',         'high',          'low',
                           'low', 'intermediate',         'high',
                           'low',          'low', 'intermediate',
                          'high',          'low',          'low',
                  'intermediate',         'high',          'low',
                           'low', 'intermediate',         'high',
                           'low',          'low', 'intermediate',
                          'high',          'low',          'low',
                  'intermediate',         'high',          'low',
                           'low', 'intermediate',         'high',
                           'low'],
                 categories=[high, intermediate, low], ordered=False, dtype='category')

vs

# with object dtype / with str dtype with this PR
>>> pd.CategoricalIndex(["low", "intermediate", "high", "low"] * 10)
CategoricalIndex(['low', 'intermediate', 'high', 'low', 'low', 'intermediate',
                  'high', 'low', 'low', 'intermediate', 'high', 'low', 'low',
                  'intermediate', 'high', 'low', 'low', 'intermediate', 'high',
                  'low', 'low', 'intermediate', 'high', 'low', 'low',
                  'intermediate', 'high', 'low', 'low', 'intermediate', 'high',
                  'low', 'low', 'intermediate', 'high', 'low', 'low',
                  'intermediate', 'high', 'low'],
                 categories=['high', 'intermediate', 'low'], ordered=False, dtype='category')

Of course this can also happen with non-strings like integers, but I think it is a lot less common

…as-dev#61894)

BUG: fix padding for string categories in CategoricalIndex repr

85c52fc

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jul 17, 2025

jorisvandenbossche mentioned this pull request Jul 17, 2025

Output formatting: the repr of the Categorical categories (quoted or unquoted strings?) #61890

Closed

jorisvandenbossche merged commit 8de38e8 into pandas-dev:main Jul 19, 2025
50 of 51 checks passed

jorisvandenbossche deleted the string-dtype-categorical-index-repr-justify branch July 19, 2025 10:34

eicchen pushed a commit to eicchen/pandas that referenced this pull request Aug 19, 2025

BUG: fix padding for string categories in CategoricalIndex repr (pand…

132c397

…as-dev#61894)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fix padding for string categories in CategoricalIndex repr #61894

BUG: fix padding for string categories in CategoricalIndex repr #61894

Uh oh!

jorisvandenbossche commented Jul 17, 2025

Uh oh!

jbrockmendel commented Jul 17, 2025

Uh oh!

jorisvandenbossche commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: fix padding for string categories in CategoricalIndex repr #61894

BUG: fix padding for string categories in CategoricalIndex repr #61894

Uh oh!

Conversation

jorisvandenbossche commented Jul 17, 2025

Uh oh!

jbrockmendel commented Jul 17, 2025

Uh oh!

jorisvandenbossche commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Jul 18, 2025 •

edited

Loading