feat: implement GPU-backed FAISS support and dynamic tokenization scaling by wgergely · Pull Request #209 · yichuan-w/LEANN

wgergely · 2026-01-07T14:49:45Z

PR Update Draft - Leann GPU Backend & Metadata Enrichment

This update builds upon the initial FAISS GPU support by adding robust metadata extraction and stabilizing the environment for production use.

What's New?

1. Metadata-Rich Indexing (Context Headers)

We’ve added a CodeAnalyzer that uses tree-sitter to extract global context from files. Every code chunk now includes a "Context Header" prepended to its text:

Module Skeletons: High-level outlines of classes and functions (signatures + docstrings).
Logical Imports: Project-relative import tracking ("5 paths") to help LLMs understand dependencies.
Entry point detection: Identifying main modules automatically.

2. FAISS Stability (ZMQ Fixes)

To prevent ZMQ deadlocks observed in high-concurrency scenarios, we've implemented an in-process embedding strategy for the FAISS backend. Search operations now compute query embeddings within the same process by default.

3. MCP Protocol v2025 Upgrade

Standardized the codebase to support the latest MCP protocol version (2025-11-25).

4. Better Environment Control

Standardized LEANN_HOME and LEANN_DOCS handling across CLI and Server modules. The system now strictly respects these environment variables if provided.

⚠️ Breaking Changes & Notes

Model Loading: Default embedding model loading now includes trust_remote_code=True to support nomic-embed-text-v1.5 out of the box.
New Dependencies: Added tree-sitter (0.23+) and gitignore-parser to core requirements.
Python parallelization: Parallel tokenization now uses ProcessPoolExecutor for true CPU parallelism.

Verification

Full test suite passed, including new integration tests for the FAISS ZMQ server and metadata analyzer.

…ding computation - Force use_server=False to prevent ZMQ connection issues - Add explicit logger for better debugging - Improve code structure and comments

…ling

Implements a standalone embedding server for the FAISS backend to prevent ZMQ deadlocks that occur when mixing direct embedding computation (build) and server-based computation (search). - Adds faiss_embedding_server.py: Specialized server reusing leann-core logic. - Updates __init__.py: Exports and registers the new server module.

Adds: - gitignore-parser: For robust .gitignore handling in the CLI. - einops: Required for nomic-embed-text-v1.5 custom implementation.

- api.py: Explicitly separate server-mode (search) vs direct-mode (build) to ensure stability. - embedding_compute.py: Add parallel tokenization, adaptive batch sizing, and support for nomic-embed-text-v1.5. - tests: Add token truncation tests.

- Add gitignore-parser integration for correct file exclusion. - Add suppress_cpp_output context manager to silence noisy FAISS/HNSW backend logs. - Add code-optimized SentenceSplitter configuration.

- metadata_filter.py: Implements comprehensive filtering (comparison, membership, string, boolean) for search results. - tests: Add test suite for metadata filtering logic.

…ity improvements

wgergely · 2026-01-08T13:56:13Z

Successfully updated the PR with the latest stabilization fixes, metadata enrichment, and MCP protocol v2025 updates.

…ontext - CodeAnalyzer: Added robust import resolution for JS/TS and Python relative paths - CodeAnalyzer: Improved AST parsing resilience with tree-sitter bindings - Chunking: Integrated context headers for better semantic search retrieval

- API: Standardization of search interfaces and error handling - Chat: Improved RAG context injection flow - Embedding Server: Robust startup/shutdown and process management - CLI: Consistency updates for downstream consumers

WHAT: - Add compute_embeddings_voyage() for Voyage AI API with 32K context - Add resolve_voyage_api_key() to settings.py for API key resolution - Update EMBEDDING_MODEL_LIMITS with voyage-code-3 (32000 tokens) - Add 'voyage' and 'gemini' to CLI --embedding-mode choices - Fix AST chunking import: chunking_utils is in parent package WHY: - Voyage Code 3 provides 77% CoIR score for code retrieval - 32K context window enables Late Chunking strategy - AST chunking was failing due to wrong relative import path (from .chunking_utils should be from ..chunking_utils) IMPACT: - Users can now use: --embedding-mode voyage --embedding-model voyage-code-3 - AST-aware chunking now works correctly in analysis module - Fallback chunking is no longer needed when AST is available

wgergely added 14 commits January 6, 2026 21:19

feat: add faiss backend and parallel tokenization

ec74791

fix: enable parallel tokenization for all chunking modes

65141a4

fix: use ProcessPoolExecutor for true CPU parallelism in tokenization

6ab082a

fix: Avoid ZMQ deadlocks in FAISS backend by forcing in-process embed…

46234ca

…ding computation - Force use_server=False to prevent ZMQ connection issues - Add explicit logger for better debugging - Improve code structure and comments

feat: implement GPU-backed FAISS support and dynamic tokenization sca…

767e761

…ling

chore(deps): Add fork-specific dependencies

4539fd0

Adds: - gitignore-parser: For robust .gitignore handling in the CLI. - einops: Required for nomic-embed-text-v1.5 custom implementation.

feat(cli): Enhance CLI with better file parsing and output control

2bf9cec

- Add gitignore-parser integration for correct file exclusion. - Add suppress_cpp_output context manager to silence noisy FAISS/HNSW backend logs. - Add code-optimized SentenceSplitter configuration.

feat(core): Add metadata filtering engine

7af5e06

- metadata_filter.py: Implements comprehensive filtering (comparison, membership, string, boolean) for search results. - tests: Add test suite for metadata filtering logic.

style(core): Clean up in-code audit comments

d4435b2

chore(deps): Update FAISS backend dependencies for GPU support

c16c596

Safety backup: pre-demangle submodule state

fdce210

feat: metadata enrichment, MCP protocol v2025-11-25, and backend/qual…

bac5dc7

…ity improvements

wgergely added 7 commits January 8, 2026 16:21

feat: enhance connection management for remote service support

9413450

feat: Transformers 4.57.3 support and Qodo 1.5B optimization

32b0e2f

chore: vendor astchunk-leann to bypass submodule permission issues

f6772d2

style: Apply ruff format and fix linting issues

f404cab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement GPU-backed FAISS support and dynamic tokenization scaling#209

feat: implement GPU-backed FAISS support and dynamic tokenization scaling#209
wgergely wants to merge 21 commits intoyichuan-w:mainfrom
wgergely:feature/gpu-backend-enhancements

wgergely commented Jan 7, 2026 •

edited

Loading

Uh oh!

wgergely commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wgergely commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Update Draft - Leann GPU Backend & Metadata Enrichment

What's New?

1. Metadata-Rich Indexing (Context Headers)

2. FAISS Stability (ZMQ Fixes)

3. MCP Protocol v2025 Upgrade

4. Better Environment Control

⚠️ Breaking Changes & Notes

Verification

Uh oh!

wgergely commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wgergely commented Jan 7, 2026 •

edited

Loading