Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 13 additions & 83 deletions docs/http-authentication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ Password-Protected Feeds

:program:`Universal Feed Parser` supports downloading and parsing
password-protected feeds that are protected by :abbr:`HTTP (Hypertext Transfer Protocol)`
authentication. Both basic and digest authentication are supported.
basic authentication. For any other types of authentication, you can handle the
authentication yourself and then parse the retrieved feed.


Downloading a feed protected by basic authentication (the easy way)
Copy link
Author

@bobwhitelock bobwhitelock Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic auth still seems to work fine, as requests supports these type of URLs - for anything else it seems like you need to handle authentication yourself

Expand All @@ -17,89 +18,23 @@ In this example, the username is test and the password is basic.
.. code-block:: pycon

>>> import feedparser
>>> d = feedparser.parse('http://test:basic@feedparser.org/docs/examples/basic_auth.xml')
>>> d = feedparser.parse('http://test:basic@$READTHEDOCS_CANONICAL_URL/examples/basic_auth.xml')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous URL was no longer available, so the example didn't work

I believe this should get expanded to https://feedparser.readthedocs.io/en/latest/examples/basic_auth.xml instead, which should work - though this feed doesn't actually seem to be protected by basic auth (and same for https://feedparser.readthedocs.io/en/latest/examples/digest_auth.xml), but I checked this works with a different URL that does have basic auth

>>> d.feed.title
'Sample Feed'

The same technique works for digest authentication. (Technically,
:program:`Universal Feed Parser` will attempt basic authentication first, but
if that fails and the server indicates that it requires digest authentication,
:program:`Universal Feed Parser` will automatically re-request the feed with
the appropriate digest authentication headers. *This means that this technique
will send your password to the server in an easily decryptable form.*)

Downloading a feed with other types of authentication
-----------------------------------------------------

.. _example.auth.inline.digest:
For any other type of authentication, you should retrieve the feed yourself and
handle authentication as needed (e.g. via `requests
<https://requests.readthedocs.io>` - this is what :program:`Universal Feed Parser`
uses internally), and then you can just call ``feedparser.parse`` on the
retrieved feed content.

Downloading a feed protected by digest authentication (the easy but horribly insecure way)
------------------------------------------------------------------------------------------

In this example, the username is test and the password is digest.

.. code-block:: pycon

>>> import feedparser
>>> d = feedparser.parse('http://test:[email protected]/docs/examples/digest_auth.xml')
>>> d.feed.title
'Sample Feed'



You can also construct a HTTPBasicAuthHandler that contains the password
information, then pass that as a handler to the ``parse`` function.
HTTPBasicAuthHandler is part of the standard `urllib2 <http://docs.python.org/lib/module-urllib2.html>`_ module.

Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` basic authentication (the hard way)
--------------------------------------------------------------------------------------------------------------

.. code-block:: python

import urllib2, feedparser

# Construct the authentication handler
auth = urllib2.HTTPBasicAuthHandler()

# Add password information: realm, host, user, password.
# A single handler can contain passwords for multiple sites;
# urllib2 will sort out which passwords get sent to which sites
# based on the realm and host of the URL you're retrieving
auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic')

# Pass the authentication handler to the feed parser.
# handlers is a list because there might be more than one
# type of handler (urllib2 defines lots of different ones,
# and you can build your own)
d = feedparser.parse(
'$READTHEDOCS_CANONICAL_URL/examples/basic_auth.xml',
handlers=[auth],
)



Digest authentication is handled in much the same way, by constructing an
HTTPDigestAuthHandler and populating it with the necessary realm, host, user,
and password information. This is more secure than
:ref:`stuffing the username and password in the URL <example.auth.inline.digest>`,
since the password will be encrypted before being sent to the server.


Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` digest authentication (the secure way)
-----------------------------------------------------------------------------------------------------------------

.. code-block:: python

import urllib2, feedparser

auth = urllib2.HTTPDigestAuthHandler()
auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest')
d = feedparser.parse(
'$READTHEDOCS_CANONICAL_URL/examples/digest_auth.xml',
handlers=[auth],
)


The examples so far have assumed that you know in advance that the feed is
password-protected. But what if you don't know?
Determining that a feed is password-protected
---------------------------------------------

If you try to download a password-protected feed without sending all the proper
password information, the server will return an
Expand All @@ -113,12 +48,7 @@ you will need to parse it yourself. Everything before the first space is the
type of authentication (probably ``Basic`` or ``Digest``), which controls which
type of handler you'll need to construct. The realm name is given as
realm="foo" -- so foo would be your first argument to auth.add_password. Other
information in the www-authenticate header is probably safe to ignore; the
:file:`urllib2` module will handle it for you.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this part of the sentence makes sense since urllib2 isn't being used



Determining that a feed is password-protected
---------------------------------------------
information in the www-authenticate header is probably safe to ignore.

.. code-block:: pycon

Expand Down
31 changes: 13 additions & 18 deletions docs/http-other.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,13 @@
Other :abbr:`HTTP (Hypertext Transfer Protocol)` Headers
========================================================

You can specify additional :abbr:`HTTP (Hypertext Transfer Protocol)` request
headers as a dictionary. When you download a feed from a remote web server,
When you download a feed from a remote web server,
:program:`Universal Feed Parser` exposes the complete set of
:abbr:`HTTP (Hypertext Transfer Protocol)` response headers as a dictionary.


.. _example.http.headers.request:

Sending custom :abbr:`HTTP (Hypertext Transfer Protocol)` request headers
-------------------------------------------------------------------------

.. code-block:: python

import feedparser
d = feedparser.parse(
'$READTHEDOCS_CANONICAL_URL/examples/atom03.xml',
request_headers={'Cache-control': 'max-age=0'},
)


Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers
---------------------------------------------------------------------------
Accessing :abbr:`HTTP (Hypertext Transfer Protocol)` response headers
---------------------------------------------------------------------

.. code-block:: pycon

Expand All @@ -39,3 +24,13 @@ Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers
'content-length': '883',
'connection': 'close',
'content-type': 'application/xml'}


Customizing :abbr:`HTTP (Hypertext Transfer Protocol)` request headers
----------------------------------------------------------------------

If you need to customize aspects of requests for a feed, such as the request
headers used, you should retrieve the feed yourself with any settings you need
(e.g. via `requests <https://requests.readthedocs.io>` - this is what
:program:`Universal Feed Parser` uses internally), and then you can just call
``feedparser.parse`` on the retrieved feed content.
23 changes: 0 additions & 23 deletions docs/http-useragent.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,6 @@ you should change the User-Agent to your application name and
Customizing the User-Agent
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting feedparser.USER_AGENT should still work, so have left that section, but the agent parameter is no longer present

--------------------------

.. code-block:: pycon

>>> import feedparser
>>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/atom10.xml',
... agent='MyApp/1.0 +http://example.com/')

You can also set the User-Agent once, globally, and then call the ``parse``
function normally.


Customizing the User-Agent permanently
--------------------------------------

.. code-block:: pycon

>>> import feedparser
Expand All @@ -44,13 +31,3 @@ download a feed from a web server. This is discouraged, because it is a
violation of `RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.36>`_.
The default behavior is to send a blank referrer, and you should never need to
override this.


Customizing the referrer
------------------------

.. code-block:: pycon

>>> import feedparser
>>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/atom10.xml',
... referrer='http://example.com/')