diff --git a/docs/http-authentication.rst b/docs/http-authentication.rst index d53ef6d4..0cd7e965 100644 --- a/docs/http-authentication.rst +++ b/docs/http-authentication.rst @@ -3,7 +3,8 @@ Password-Protected Feeds :program:`Universal Feed Parser` supports downloading and parsing password-protected feeds that are protected by :abbr:`HTTP (Hypertext Transfer Protocol)` -authentication. Both basic and digest authentication are supported. +basic authentication. For any other types of authentication, you can handle the +authentication yourself and then parse the retrieved feed. Downloading a feed protected by basic authentication (the easy way) @@ -17,89 +18,23 @@ In this example, the username is test and the password is basic. .. code-block:: pycon >>> import feedparser - >>> d = feedparser.parse('http://test:basic@feedparser.org/docs/examples/basic_auth.xml') + >>> d = feedparser.parse('http://test:basic@$READTHEDOCS_CANONICAL_URL/examples/basic_auth.xml') >>> d.feed.title 'Sample Feed' -The same technique works for digest authentication. (Technically, -:program:`Universal Feed Parser` will attempt basic authentication first, but -if that fails and the server indicates that it requires digest authentication, -:program:`Universal Feed Parser` will automatically re-request the feed with -the appropriate digest authentication headers. *This means that this technique -will send your password to the server in an easily decryptable form.*) +Downloading a feed with other types of authentication +----------------------------------------------------- -.. _example.auth.inline.digest: +For any other type of authentication, you should retrieve the feed yourself and +handle authentication as needed (e.g. via `requests +` - this is what :program:`Universal Feed Parser` +uses internally), and then you can just call ``feedparser.parse`` on the +retrieved feed content. -Downloading a feed protected by digest authentication (the easy but horribly insecure way) ------------------------------------------------------------------------------------------- -In this example, the username is test and the password is digest. - -.. code-block:: pycon - - >>> import feedparser - >>> d = feedparser.parse('http://test:digest@feedparser.org/docs/examples/digest_auth.xml') - >>> d.feed.title - 'Sample Feed' - - - -You can also construct a HTTPBasicAuthHandler that contains the password -information, then pass that as a handler to the ``parse`` function. -HTTPBasicAuthHandler is part of the standard `urllib2 `_ module. - -Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` basic authentication (the hard way) --------------------------------------------------------------------------------------------------------------- - -.. code-block:: python - - import urllib2, feedparser - - # Construct the authentication handler - auth = urllib2.HTTPBasicAuthHandler() - - # Add password information: realm, host, user, password. - # A single handler can contain passwords for multiple sites; - # urllib2 will sort out which passwords get sent to which sites - # based on the realm and host of the URL you're retrieving - auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic') - - # Pass the authentication handler to the feed parser. - # handlers is a list because there might be more than one - # type of handler (urllib2 defines lots of different ones, - # and you can build your own) - d = feedparser.parse( - '$READTHEDOCS_CANONICAL_URL/examples/basic_auth.xml', - handlers=[auth], - ) - - - -Digest authentication is handled in much the same way, by constructing an -HTTPDigestAuthHandler and populating it with the necessary realm, host, user, -and password information. This is more secure than -:ref:`stuffing the username and password in the URL `, -since the password will be encrypted before being sent to the server. - - -Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` digest authentication (the secure way) ------------------------------------------------------------------------------------------------------------------ - -.. code-block:: python - - import urllib2, feedparser - - auth = urllib2.HTTPDigestAuthHandler() - auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest') - d = feedparser.parse( - '$READTHEDOCS_CANONICAL_URL/examples/digest_auth.xml', - handlers=[auth], - ) - - -The examples so far have assumed that you know in advance that the feed is -password-protected. But what if you don't know? +Determining that a feed is password-protected +--------------------------------------------- If you try to download a password-protected feed without sending all the proper password information, the server will return an @@ -113,12 +48,7 @@ you will need to parse it yourself. Everything before the first space is the type of authentication (probably ``Basic`` or ``Digest``), which controls which type of handler you'll need to construct. The realm name is given as realm="foo" -- so foo would be your first argument to auth.add_password. Other -information in the www-authenticate header is probably safe to ignore; the -:file:`urllib2` module will handle it for you. - - -Determining that a feed is password-protected ---------------------------------------------- +information in the www-authenticate header is probably safe to ignore. .. code-block:: pycon diff --git a/docs/http-other.rst b/docs/http-other.rst index 8da48c3d..4ea6e733 100644 --- a/docs/http-other.rst +++ b/docs/http-other.rst @@ -1,28 +1,13 @@ Other :abbr:`HTTP (Hypertext Transfer Protocol)` Headers ======================================================== -You can specify additional :abbr:`HTTP (Hypertext Transfer Protocol)` request -headers as a dictionary. When you download a feed from a remote web server, +When you download a feed from a remote web server, :program:`Universal Feed Parser` exposes the complete set of :abbr:`HTTP (Hypertext Transfer Protocol)` response headers as a dictionary. -.. _example.http.headers.request: - -Sending custom :abbr:`HTTP (Hypertext Transfer Protocol)` request headers -------------------------------------------------------------------------- - -.. code-block:: python - - import feedparser - d = feedparser.parse( - '$READTHEDOCS_CANONICAL_URL/examples/atom03.xml', - request_headers={'Cache-control': 'max-age=0'}, - ) - - -Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers ---------------------------------------------------------------------------- +Accessing :abbr:`HTTP (Hypertext Transfer Protocol)` response headers +--------------------------------------------------------------------- .. code-block:: pycon @@ -39,3 +24,13 @@ Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers 'content-length': '883', 'connection': 'close', 'content-type': 'application/xml'} + + +Customizing :abbr:`HTTP (Hypertext Transfer Protocol)` request headers +---------------------------------------------------------------------- + +If you need to customize aspects of requests for a feed, such as the request +headers used, you should retrieve the feed yourself with any settings you need +(e.g. via `requests ` - this is what +:program:`Universal Feed Parser` uses internally), and then you can just call +``feedparser.parse`` on the retrieved feed content. diff --git a/docs/http-useragent.rst b/docs/http-useragent.rst index fe557c8b..0795b53b 100644 --- a/docs/http-useragent.rst +++ b/docs/http-useragent.rst @@ -19,19 +19,6 @@ you should change the User-Agent to your application name and Customizing the User-Agent -------------------------- -.. code-block:: pycon - - >>> import feedparser - >>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/atom10.xml', - ... agent='MyApp/1.0 +http://example.com/') - -You can also set the User-Agent once, globally, and then call the ``parse`` -function normally. - - -Customizing the User-Agent permanently --------------------------------------- - .. code-block:: pycon >>> import feedparser @@ -44,13 +31,3 @@ download a feed from a web server. This is discouraged, because it is a violation of `RFC 2616 `_. The default behavior is to send a blank referrer, and you should never need to override this. - - -Customizing the referrer ------------------------- - -.. code-block:: pycon - - >>> import feedparser - >>> d = feedparser.parse('$READTHEDOCS_CANONICAL_URL/examples/atom10.xml', - ... referrer='http://example.com/')