Release 4.1.1
What's new
This release focuses on integrating Chonkie for semantic chunking, improving test reliability, and code quality enhancements through comprehensive linting.
Features
- Chonkie Semantic Chunking Integration
- Implemented
ChonkieSemanticSplitterusing semantic chunking with memoization ([081e81a]) - Added
transform_documentsmethod to ChonkieSemanticSplitter ([534cc90]) - Replaced
RecursiveCharacterTextSplitterwithChonkieSemanticSplitterin summarize.py ([77f1652]) - Added chonkie to requirements ([7234f86])
- Merged chonkie branch into dev ([f89390a])
- Implemented
Fixes
-
Logging & Display
-
Parsing & Type Hints
Refactor
- Split batch file loader into two files ([a0420fd])
- Comprehensive ruff linter run across codebase ([d9f7eac])
- Switched from black to ruff ([2d8a51b])
- Made ruff configuration less strict ([e04fc8d])
Tests
- DDG Test Improvements
Chore
- Updated pyfiglet font ([fd49cca])
- Cleaned up completed TODO items from README ([113c008], [ea4a99b], [8656327], [7206234], [4a4f4e8])
- Minor improvements ([b873469], [ca55c13])
Commits details since the last release
- [766c373] by @thiswillbeyourgithub, 4 seconds ago:
bump version 4.1.0 -> 4.1.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [e1b2a87] by @thiswillbeyourgithub, 66 minutes ago:
test: finally fixed the ddg error not capturing the output
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
- [a02ffbc] by @thiswillbeyourgithub, 2 hours ago:
test: dont use alias of grep
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
- [66bf47c] by @thiswillbeyourgithub, 2 hours ago:
test: print output before the error message
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
- [4a4f4e8] by @thiswillbeyourgithub, 2 hours ago:
todo: done the chonkie integration
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [d9c5ae9] by @thiswillbeyourgithub, 2 hours ago:
test: capture the ddg output
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
- [96c5186] by @thiswillbeyourgithub, 2 hours ago:
test: better way to print the output
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
- [ccdffd1] by @thiswillbeyourgithub, 5 hours ago:
test: set max ddg results to 10 because it fails too often
Signed-off-by: thiswillbeyourgithub [email protected]
tests/test_cli.sh
tests/test_wdoc.py
- [a0420fd] by @thiswillbeyourgithub, 5 hours ago:
new: split batch file loader into two files
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/batch_file_loader.py
wdoc/utils/load_recursive.py
- [7234f86] by @thiswillbeyourgithub, 6 hours ago:
add chonkie to requirements
Signed-off-by: thiswillbeyourgithub [email protected]
setup.py
-
[f89390a] by @thiswillbeyourgithub, 25 hours ago:
Merge branch 'chonkie' into dev -
[534cc90] by @thiswillbeyourgithub, 27 hours ago:
feat: add transform_documents method to ChonkieSemanticSplitter
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) [email protected]
wdoc/utils/misc.py
- [77f1652] by @thiswillbeyourgithub, 29 hours ago:
refactor: replace RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) [email protected]
wdoc/utils/tasks/summarize.py
- [081e81a] by @thiswillbeyourgithub, 29 hours ago:
feat: implement ChonkieSemanticSplitter using semantic chunking with memoization
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) [email protected]
wdoc/utils/misc.py
- [615828a] by @thiswillbeyourgithub, 3 days ago:
fix: typehint error for topk autoincrease
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/tasks/query.py
- [fd49cca] by @thiswillbeyourgithub, 3 days ago:
better pyfiglet font
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/wdoc.py
- [99502e7] by @thiswillbeyourgithub, 3 days ago:
fix: colors where not appearing in loguru
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/logger.py
- [83e7fb9] by @thiswillbeyourgithub, 3 days ago:
fix wrong logic for stdout color
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/logger.py
- [b873469] by @thiswillbeyourgithub, 3 days ago:
minor
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/logger.py
- [d2bca84] by @thiswillbeyourgithub, 3 days ago:
fix: allow llm to mention thinking inside it's thinking
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/misc.py
- [a50ec42] by @thiswillbeyourgithub, 3 days ago:
fix: error message when parsing thinking
Signed-off-by: thiswillbeyourgithub [email protected]
wdoc/utils/misc.py
- [ca55c13] by @thiswillbeyourgithub, 4 days ago:
minor
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [113c008] by @thiswillbeyourgithub, 4 days ago:
todo: no need to mention karakeep becauseit will be a loder
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [ea4a99b] by @thiswillbeyourgithub, 4 days ago:
todo: no more need to make an llm plugin because we support pipes now
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [8656327] by @thiswillbeyourgithub, 4 days ago:
todo: remove todo to move the task to their own file because it's done
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [7206234] by @thiswillbeyourgithub, 4 days ago:
todo: remove need for using dataclass to store tasks as its done
Signed-off-by: thiswillbeyourgithub [email protected]
README.md
- [d9f7eac] by @thiswillbeyourgithub, 4 days ago:
run ruff linter everywhere
scripts/AnkiFiltered/AnkiFilteredDeckCreator.py
scripts/NtfySummarizer/NtfySummarizer.py
scripts/TheFiche/TheFiche.py
tests/test_parsing.py
tests/test_vectorstores.py
tests/test_wdoc.py
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/filters.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders/init.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/local_audio.py
wdoc/utils/loaders/local_html.py
wdoc/utils/loaders/local_video.py
wdoc/utils/loaders/logseq_markdown.py
wdoc/utils/loaders/online_media.py
wdoc/utils/loaders/pdf.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/loaders/youtube.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/shared_query_search.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py
- [e04fc8d] by @thiswillbeyourgithub, 4 days ago:
less strict ruff
Signed-off-by: thiswillbeyourgithub [email protected]
.pre-commit-config.yaml
- [2d8a51b] by @thiswillbeyourgithub, 4 days ago:
switch from black to ruff
Signed-off-by: thiswillbeyourgithub [email protected]
.pre-commit-config.yaml
setup.py