Skip to content

BUG: Raise ParserWarning when on_bad_lines is callable and index_col is set (GH#61882) #61902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
4 changes: 4 additions & 0 deletions bad_lines.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name,age
Alice,30
Bob,not_a_number
Charlie,40,extra_column
324 changes: 324 additions & 0 deletions n_bad_lines-parserwarning

Large diffs are not rendered by default.

29 changes: 16 additions & 13 deletions pandas/io/parsers/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -668,23 +668,26 @@ def _validate_names(names: Sequence[Hashable] | None) -> None:
raise ValueError("Names should be an ordered collection.")


def _read(
filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str], kwds
) -> DataFrame | TextFileReader:
"""Generic reader of line files."""
# if we pass a date_format and parse_dates=False, we should not parse the
# dates GH#44366
if kwds.get("parse_dates", None) is None:
if kwds.get("date_format", None) is None:
kwds["parse_dates"] = False
else:
kwds["parse_dates"] = True
def _read(filepath_or_buffer, kwds):
import warnings

from pandas.errors import ParserWarning

# Extract some of the arguments (pass chunksize on).
iterator = kwds.get("iterator", False)
chunksize = kwds.get("chunksize", None)

# Check type of encoding_errors
# Your inserted warning
on_bad_lines = kwds.get("on_bad_lines", "error")
index_col = kwds.get("index_col", None)

if callable(on_bad_lines) and index_col is not None:
warnings.warn(
"When using a callable for on_bad_lines with index_col set, "
"ParserWarning should be explicitly handled. This behavior may change.",
ParserWarning,
stacklevel=3,
)

errors = kwds.get("encoding_errors", "strict")
if not isinstance(errors, str):
raise ValueError(
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/io/parser/test_read_csv_warn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import pandas as pd


def my_bad_line_handler(bad_line):
print("Bad line encountered:", bad_line)


df = pd.read_csv(
"test.csv", # make sure this file exists in same folder or adjust the path
on_bad_lines=my_bad_line_handler,
index_col=0,
engine="python", # ✅ add this line
)
print(df)
Loading
Loading