Skip to content

Commit 1cfb7db

Browse files
authored
prepare version 1 (#84)
1 parent 9dde0f0 commit 1cfb7db

4 files changed

Lines changed: 21 additions & 2 deletions

File tree

HISTORY.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
## History / Changelog
22

33

4+
### 1.0.0
5+
6+
- license change from GPLv3+ to Apache 2.0 (#81)
7+
- UrlStore: `write()` method and `load_store()` function added (#83)
8+
- add parameter `trailing_slash` to keep of discard slashes at the end of URLs (#52)
9+
- maintenance: fix whitespace in `clean_url()` (#77), simplify code (#79)
10+
11+
412
### 0.9.5
513

614
- IRI to URI normalization: encode path, query and fragments (#58, #60)

README.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,12 @@ All useful operations chained in ``check_url(url)``:
114114
# check for redirects (HEAD request)
115115
>>> url, domain_name = check_url(my_url, with_redirects=True)
116116
117+
# include navigation pages instead of discarding them
118+
>>> check_url('http://www.example.org/page/10/', with_nav=True)
119+
120+
# remove trailing slash
121+
>>> check_url('https://github.com/adbar/courlan/', trailing_slash=False)
122+
117123
118124
Language-aware heuristics, notably internationalization in URLs, are available in ``lang_filter(url, language)``:
119125

@@ -311,6 +317,10 @@ The ``UrlStore`` class allow for storing and retrieving domain-classified URLs,
311317
- ``download_threshold_reached(threshold)``: Find out if the download limit (in seconds) has been reached for one of the websites in store.
312318
- ``unvisited_websites_number()``: Return the number of websites for which there are still URLs to visit.
313319
- ``is_exhausted_domain(domain)``: Tell if all known URLs for the website have been visited.
320+
- Persistance
321+
- ``write(filename)``: Save the store to disk.
322+
- ``load_store(filename)``: Read a UrlStore from disk (separate function, not class method).
323+
314324

315325
Optional settings:
316326
- ``compressed=True``: activate compression of URLs and rules

courlan/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
__title__ = "courlan"
77
__author__ = "Adrien Barbaresi"
88
__license__ = "Apache-2.0"
9-
__copyright__ = "Copyright 2020-2023, Adrien Barbaresi"
10-
__version__ = "0.9.5"
9+
__copyright__ = "Copyright 2020-2024, Adrien Barbaresi"
10+
__version__ = "1.0.0"
1111

1212

1313
# imports

courlan/core.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def check_url(
5454
with_redirects: set to True for redirection test (per HTTP HEAD request)
5555
language: set target language (ISO 639-1 codes)
5656
with_nav: set to True to include navigation pages instead of discarding them
57+
trailing_slash: set to False to trim trailing slashes
5758
5859
Returns:
5960
A tuple consisting of canonical URL and extracted domain

0 commit comments

Comments
 (0)