Skip to content

Commit f33207b

Browse files
BUG: read_csv skipping a line with double quotes (#62855)
Co-authored-by: u7397058 <[email protected]> Co-authored-by: Zephy <[email protected]>
1 parent 17b20d2 commit f33207b

File tree

3 files changed

+51
-0
lines changed

3 files changed

+51
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1124,6 +1124,7 @@ I/O
11241124
- Bug in :meth:`read_csv` for the ``c`` and ``python`` engines where parsing numbers with large exponents caused overflows. Now, numbers with large positive exponents are parsed as ``inf`` or ``-inf`` depending on the sign of the mantissa, while those with large negative exponents are parsed as ``0.0`` (:issue:`62617`, :issue:`38794`, :issue:`62740`)
11251125
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
11261126
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
1127+
- Bug in :meth:`read_csv` where it did not appropriately skip a line when instructed, causing Empty Data Error (:issue:`62739`)
11271128
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
11281129
- Bug in :meth:`read_csv` with ``c`` and ``python`` engines reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
11291130
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)

pandas/io/parsers/python_parser.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,14 @@ class MyDialect(csv.Dialect):
218218

219219
if sep is not None:
220220
dia.delimiter = sep
221+
# Skip rows at file level before csv.reader sees them
222+
# prevents CSV parsing errors on lines that will be discarded
223+
if self.skiprows is not None:
224+
while self.skipfunc(self.pos):
225+
line = f.readline()
226+
if not line:
227+
break
228+
self.pos += 1
221229
else:
222230
# attempt to sniff the delimiter from the first valid line,
223231
# i.e. no comment line and not in skiprows

pandas/tests/io/parser/test_python_parser_only.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -599,3 +599,45 @@ def fixer(bad_line):
599599
)
600600

601601
tm.assert_frame_equal(result, expected)
602+
603+
604+
def test_read_csv_leading_quote_skip(python_parser_only):
605+
# GH 62739
606+
tbl = """\
607+
"
608+
a b
609+
1 3
610+
"""
611+
parser = python_parser_only
612+
result = parser.read_csv(
613+
StringIO(tbl),
614+
delimiter=" ",
615+
skiprows=1,
616+
)
617+
expected = DataFrame({"a": [1], "b": [3]})
618+
tm.assert_frame_equal(result, expected)
619+
620+
621+
def test_read_csv_unclosed_double_quote_in_data_still_errors(python_parser_only):
622+
# GH 62739
623+
tbl = """\
624+
a b
625+
"
626+
1 3
627+
"""
628+
parser = python_parser_only
629+
with pytest.raises(ParserError, match="unexpected end of data"):
630+
parser.read_csv(StringIO(tbl), delimiter=" ", skiprows=1)
631+
632+
633+
def test_read_csv_skiprows_zero(python_parser_only):
634+
# GH 62739
635+
tbl = """\
636+
"
637+
a b
638+
1 3
639+
"""
640+
parser = python_parser_only
641+
# don't skip anything
642+
with pytest.raises(ParserError, match="unexpected end of data"):
643+
parser.read_csv(StringIO(tbl), delimiter=" ", skiprows=0, engine="python")

0 commit comments

Comments
 (0)