Shorten cache filenames to fit eCryptfs 143-byte NAME_MAX by Chessing234 · Pull Request #566 · allenai/scispacy

Chessing234 · 2026-04-08T03:23:17Z

Summary

url_to_filename() was appending the full trailing URL path component (e.g. tfidf_vectors_sparse.npz) to the hash-based filename, producing names up to 154 characters
This exceeds the 143-byte NAME_MAX on eCryptfs-encrypted filesystems (common on Ubuntu encrypted home directories), causing OSError: [Errno 36] File name too long
Now only the file extension is preserved (e.g. .npz), keeping the worst-case filename (including .json metadata sidecar) under 143 bytes
_find_existing_cache_file() matches both old-format and new-format filenames for backward compatibility — existing caches continue to work

Fixes #539, related to #447

Changes

scispacy/file_cache.py: url_to_filename() now appends only the file extension instead of the full trailing path component; added _find_existing_cache_file() helper that supports both old and new filename formats
tests/test_file_cache.py: Added test verifying all actual scispacy linker URLs produce filenames under the 143-byte limit

Test plan

python -m pytest tests/test_file_cache.py -v passes
Existing cached files (old format) are still found without re-download
New downloads produce shorter filenames that work on eCryptfs

🤖 Generated with Claude Code

url_to_filename() was appending the full trailing URL path component (e.g. tfidf_vectors_sparse.npz) to the hash-based filename, producing names up to 154 characters. This exceeds the 143-byte NAME_MAX on eCryptfs-encrypted filesystems, causing OSError: File name too long. Now only the file extension is preserved (e.g. .npz), keeping the worst-case filename (including .json sidecar) under 143 bytes. _find_existing_cache_file() matches both old and new filename formats for backward compatibility. Fixes allenai#539 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorten cache filenames to fit eCryptfs 143-byte NAME_MAX#566

Shorten cache filenames to fit eCryptfs 143-byte NAME_MAX#566
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/cache-filename-too-long

Chessing234 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chessing234 commented Apr 8, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant