Skip to content

Perf: Page.to_dataframe / to_polars goes through per-row Pydantic dumps #264

Description

@TexasCoding

Summary

Both DataFrame paths walk every item with model_dump(mode="python") — for a 1000-row trades_history page this is 1000 nested-dict allocations plus per-field Python-level conversion, then pandas/polars re-infers dtypes from records. Order of magnitude slower than column-oriented construction.

Location

  • kalshi/models/common.py:44-55to_dataframe
  • kalshi/models/common.py:57-68to_polars

Evidence

def to_dataframe(self) -> pandas.DataFrame:
    ...
    records = [item.model_dump(mode="python") for item in self.items]
    return pd.DataFrame(records)

def to_polars(self) -> polars.DataFrame:
    ...
    records = [item.model_dump(mode="python") for item in self.items]
    return pl.DataFrame(records)

Recommended fix

Build columns once ({field: [getattr(item, field) for item in self.items] for field in cls.model_fields}) and pass to pd.DataFrame(columns_dict) / pl.from_dict(...); preserves the #225 Decimal contract and avoids per-row dict construction. Add a bench_page_to_dataframe.py.

Severity & category

medium / performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance / hot-path concern

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions