Skip to content

gh-64612: Update error handlers list under open() #137304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Doc/library/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,8 @@ error handling schemes by accepting the *errors* string argument:
The following error handlers can be used with all Python
:ref:`standard-encodings` codecs:

.. The following tables are reproduced on the library/functions page under open.

.. tabularcolumns:: |l|L|

+-------------------------+-----------------------------------------------+
Expand Down
69 changes: 41 additions & 28 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1423,37 +1423,50 @@ are always available. They are listed here in alphabetical order.
*errors* is an optional string that specifies how encoding and decoding
errors are to be handled—this cannot be used in binary mode.
A variety of standard error handlers are available
(listed under :ref:`error-handlers`), though any
error handling name that has been registered with
(listed under :ref:`error-handlers`, and summarized below for convenience),
though any error handling name that has been registered with
:func:`codecs.register_error` is also valid. The standard names
include:

* ``'strict'`` to raise a :exc:`ValueError` exception if there is
an encoding error. The default value of ``None`` has the same
effect.

* ``'ignore'`` ignores errors. Note that ignoring encoding errors
can lead to data loss.

* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
where there is malformed data.

* ``'surrogateescape'`` will represent any incorrect bytes as low
surrogate code units ranging from U+DC80 to U+DCFF.
These surrogate code units will then be turned back into
the same bytes when the ``surrogateescape`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.

* ``'xmlcharrefreplace'`` is only supported when writing to a file.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.

* ``'backslashreplace'`` replaces malformed data by Python's backslashed
escape sequences.

* ``'namereplace'`` (also only supported when writing)
replaces unsupported characters with ``\N{...}`` escape sequences.
.. list-table::
:header-rows: 1

* - Error handler
- Description
* - ``'strict'``
- Raise a :exc:`UnicodeError` (or a subclass) exception if there is
an error. The default value of ``None`` has the same effect.
* - ``'ignore'``
- Ignore the malformed data and continue without further notice.
Note that ignoring encoding errors can lead to data loss.
* - ``'replace'``
- Replace malformed data with a replacement marker.
On writing, use ``?`` (ASCII character 63).
On reading, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER)
* - ``'backslashreplace'``
- Replace malformed data with backslashed escape sequences.
On writing, use hexadecimal form of Unicode code points with formats
:samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`.
On reading, use hexadecimal form of byte value with format :samp:`\\x{hh}`.
* - ``'surrogateescape'``
- Will represent any incorrect bytes as low
surrogate code units ranging from ``U+DC80`` to ``U+DCFF``.
These surrogate code units will then be turned back into
the same bytes when the ``'surrogateescape'`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.
* - ``'surrogatepass'``
- Only available for Unicode codecs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these all Unicode codecs?

Suggested change
- Only available for Unicode codecs.
- Only available for UTF-8, UTF-16 and UTF-32 codecs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codecs documentation lists the little/big endian variants, though I think wr can be less specific here.

Allow encoding and decoding surrogate code points
(``U+D800`` - ``U+DFFF``) as normal code points. Otherwise these codecs
treat the presence of surrogate code points in :class:`str` as an error.
* - ``'xmlcharrefreplace'``
- Only supported when writing.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.
* - ``'namereplace'``
- Only supported when writing. Replaces unsupported characters with
``\N{...}`` escape sequences.

.. index::
single: universal newlines; open() built-in function
Expand Down
Loading