Skip to content

Commit 73969a7

Browse files
committed
Update Rust extension documentation to indicate experimental status
- Add prominent warning that this is NOT PRODUCTION READY - Document test status: 86 core tests passing, ~85 tests skipped - List unimplemented features: - Custom type encoders (TypeEncoder, TypeRegistry, FallbackEncoder) - RawBSONDocument codec options - Some DBRef edge cases - Complete type checking support - Document skip mechanism using @skip_if_rust_bson marker - Update module structure documentation (6 modules) - Change status from 'production-ready' to 'experimental' - Clarify performance limitations (~5x slower than C) - Add clear recommendation to use C extension for production
1 parent 90397a0 commit 73969a7

File tree

2 files changed

+129
-37
lines changed

2 files changed

+129
-37
lines changed

bson/_rbson/README.md

Lines changed: 114 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,87 @@
11
# Rust BSON Extension Module
22

3+
⚠️ **NOT PRODUCTION READY** - This is an experimental implementation with incomplete feature support and performance limitations. See [Test Status](#test-status) and [Performance Analysis](#performance-analysis) sections below.
4+
35
This directory contains a Rust-based implementation of BSON encoding/decoding for PyMongo, developed as part of [PYTHON-5683](https://jira.mongodb.org/browse/PYTHON-5683).
46

57
## Overview
68

7-
The Rust extension (`_rbson`) provides the same interface as the C extension (`_cbson`) but is implemented in Rust using:
9+
The Rust extension (`_rbson`) provides a **partial implementation** of the C extension (`_cbson`) interface, implemented in Rust using:
810
- **PyO3**: Python bindings for Rust
911
- **bson crate**: MongoDB's official Rust BSON library
1012
- **Maturin**: Build tool for Rust Python extensions
1113

14+
## Test Status
15+
16+
### ✅ Core BSON Tests: 86 passed, 2 skipped
17+
The basic BSON encoding/decoding functionality works correctly (`test/test_bson.py`).
18+
19+
### ⏭️ Skipped Tests: ~85 tests across multiple test files
20+
The following features are **not implemented** and tests are skipped when using the Rust extension:
21+
22+
#### Custom Type Encoders (test/test_custom_types.py)
23+
- **`TypeEncoder` and `TypeRegistry`** - Custom type encoding/decoding
24+
- **`FallbackEncoder`** - Fallback encoding for unknown types
25+
- **Tests skipped**: All tests in `TestBSONFallbackEncoder`, `TestCustomPythonBSONTypeToBSONMonolithicCodec`, `TestCustomPythonBSONTypeToBSONMultiplexedCodec`
26+
- **Reason**: Rust extension doesn't support custom type encoders or fallback encoders
27+
28+
#### RawBSONDocument (test/test_raw_bson.py)
29+
- **`RawBSONDocument` codec options** - Raw BSON document handling
30+
- **Tests skipped**: All tests in `TestRawBSONDocument`
31+
- **Reason**: Rust extension doesn't implement RawBSONDocument codec options
32+
33+
#### DBRef Edge Cases (test/test_dbref.py)
34+
- **DBRef validation and edge cases**
35+
- **Tests skipped**: Some DBRef tests
36+
- **Reason**: Incomplete DBRef handling in Rust extension
37+
38+
#### Type Checking (test/test_typing.py)
39+
- **Type hints and mypy validation**
40+
- **Tests skipped**: Some typing tests
41+
- **Reason**: Type checking issues with Rust extension
42+
43+
### Skip Mechanism
44+
Tests are skipped using the `@skip_if_rust_bson` pytest marker defined in `test/__init__.py`:
45+
```python
46+
skip_if_rust_bson = pytest.mark.skipif(
47+
_use_rust_bson(), reason="Rust BSON extension does not support this feature"
48+
)
49+
```
50+
51+
This marker is applied to test classes and methods that use unimplemented features.
52+
1253
## Implementation History
1354

1455
This implementation was developed through [PR #2695](https://github.com/mongodb/mongo-python-driver/pull/2695) to investigate using Rust as an alternative to C for Python extension modules.
1556

1657
### Key Milestones
1758

18-
1. **Initial Implementation** - Complete BSON type support with 100% test compatibility (88/88 tests passing)
59+
1. **Initial Implementation** - Basic BSON type support with core functionality
1960
2. **Performance Optimizations** - Type caching, fast paths for common types, direct byte operations
20-
3. **Architectural Analysis** - Identified fundamental performance differences between Rust and C approaches
61+
3. **Modular Refactoring** - Split monolithic lib.rs into 6 well-organized modules
62+
4. **Test Integration** - Added skip markers for unimplemented features (~85 tests skipped)
2163

2264
## Features
2365

2466
### Supported BSON Types
2567

26-
The Rust extension supports all BSON types:
68+
The Rust extension supports basic BSON types:
2769
- **Primitives**: Double, String, Int32, Int64, Boolean, Null
2870
- **Complex Types**: Document, Array, Binary, ObjectId, DateTime
2971
- **Special Types**: Regex, Code, Timestamp, Decimal128, MinKey, MaxKey
3072
- **Deprecated Types**: DBPointer (decodes to DBRef)
3173

3274
### CodecOptions Support
3375

34-
Full support for PyMongo's `CodecOptions`:
35-
- `document_class` - Custom document classes
36-
- `tz_aware` - Timezone-aware datetime handling
37-
- `tzinfo` - Timezone conversion
38-
- `uuid_representation` - UUID encoding/decoding modes
39-
- `datetime_conversion` - DateTime handling modes (AUTO, CLAMP, MS)
40-
- `unicode_decode_error_handler` - UTF-8 error handling
76+
**Partial** support for PyMongo's `CodecOptions`:
77+
-`document_class` - Custom document classes (basic support)
78+
-`tz_aware` - Timezone-aware datetime handling
79+
-`tzinfo` - Timezone conversion
80+
-`uuid_representation` - UUID encoding/decoding modes
81+
-`datetime_conversion` - DateTime handling modes (AUTO, CLAMP, MS)
82+
-`unicode_decode_error_handler` - UTF-8 error handling
83+
-`type_registry` - Custom type encoders/decoders (NOT IMPLEMENTED)
84+
- ❌ RawBSONDocument support (NOT IMPLEMENTED)
4185

4286
### Runtime Selection
4387

@@ -144,8 +188,8 @@ The Copilot POC's claimed 2.89x speedup was likely due to:
144188

145189
When these missing features were added to achieve 100% compatibility, the true performance cost of the Rust `Bson` enum architecture became apparent.
146190

147-
### Current Implementation (PR #2695) - Production-Ready
148-
**Status**: 88/88 tests passing (100%)
191+
### Current Implementation (PR #2695) - Experimental
192+
**Status**: 86/88 core BSON tests passing, ~85 feature tests skipped
149193

150194
**Build System**: `maturin build --release` (proper wheel generation)
151195
- Uses Maturin for proper Python packaging
@@ -154,9 +198,9 @@ When these missing features were added to achieve 100% compatibility, the true p
154198
- Located in `bson/_rbson/` directory (proper module structure)
155199

156200
**Improvements over Copilot POC:**
157-
-**100% test compatibility** (88/88 vs 53/88)
158-
-**Complete CodecOptions support**:
159-
- `document_class` - Custom document classes
201+
-**Core BSON functionality** (86/88 tests passing in test_bson.py)
202+
-**Basic CodecOptions support**:
203+
- `document_class` - Custom document classes (basic support)
160204
- `tzinfo` - Timezone conversion with astimezone()
161205
- `datetime_conversion` - All modes (AUTO, CLAMP, MS)
162206
- `unicode_decode_error_handler` - Fallback to Python for non-strict handlers
@@ -166,22 +210,24 @@ When these missing features were added to achieve 100% compatibility, the true p
166210
- Fast paths for common types (int, str, bool, None)
167211
- Direct byte operations where possible
168212
- PyDict fast path with pre-allocation
169-
-**Production-ready error handling** (matches C extension error messages exactly)
213+
-**Modular code structure** (6 well-organized Rust modules)
170214
-**Proper module structure** (`bson/_rbson/` with build.sh and maturin)
171215
-**Runtime selection** via PYMONGO_USE_RUST environment variable
172-
-**Comprehensive testing** (cross-compatibility tests, performance benchmarks)
216+
-**Test skip markers** for unimplemented features
173217
-**Same Rust architecture**: PyO3 0.23 + bson 2.13 crate (Python → Bson enum → bytes)
174218

219+
**Missing Features** (see [Test Status](#test-status)):
220+
-**Custom type encoders** (`TypeEncoder`, `TypeRegistry`, `FallbackEncoder`)
221+
-**RawBSONDocument** codec options
222+
-**Some DBRef edge cases**
223+
-**Complete type checking support**
224+
175225
**Performance Reality**: ~0.21x (5x slower than C) - see Performance Analysis section
176226

177227
**Key Insights**:
178228
1. **Same Architecture, Different Results**: Both implementations use the same Rust architecture (PyO3 + bson crate with intermediate `Bson` enum), so the build system (cargo vs maturin) is not the cause of the performance difference.
179-
2. **Incomplete vs Complete**: The POC's speed claims were based on incomplete functionality (60% test pass rate). Achieving 100% compatibility revealed the true performance cost of:
180-
- Complete CodecOptions handling (timezone conversions, datetime modes, etc.)
181-
- BSON validation (size checks, null terminators, extra bytes)
182-
- Production-ready error handling
183-
- Edge case handling for all 88 tests
184-
3. **The Fundamental Issue**: Both implementations suffer from the same architectural limitation (Python → Bson enum → bytes), but it only becomes a significant bottleneck when you implement all the features required for production use.
229+
2. **Incomplete Implementation**: The current implementation has ~85 tests skipped due to unimplemented features (custom type encoders, RawBSONDocument, etc.). This is an experimental implementation, not production-ready.
230+
3. **The Fundamental Issue**: The Rust architecture (Python → Bson enum → bytes) has inherent performance limitations compared to the C extension's direct byte-writing approach.
185231

186232
## Direct Byte-Writing Performance Results
187233

@@ -317,34 +363,70 @@ maturin develop --release
317363

318364
## Testing
319365

320-
Run the test suite with the Rust extension:
366+
Run the core BSON test suite with the Rust extension:
367+
```bash
368+
PYMONGO_USE_RUST=1 python -m pytest test/test_bson.py -v
369+
# Expected: 86 passed, 2 skipped
370+
```
371+
372+
Run all tests (including skipped tests):
321373
```bash
322-
PYMONGO_USE_RUST=1 python -m pytest test/
374+
PYMONGO_USE_RUST=1 python -m pytest test/ -v
375+
# Expected: Many tests passed, ~85 tests skipped due to unimplemented features
323376
```
324377

325378
Run performance benchmarks:
326379
```bash
327380
python test/performance/perf_test.py
328381
```
329382

383+
## Module Structure
384+
385+
The Rust codebase is organized into 6 well-structured modules (refactored from a single 3,117-line file):
386+
387+
- **`lib.rs`** (76 lines) - Module exports and public API
388+
- **`types.rs`** (266 lines) - Type cache and BSON type markers
389+
- **`errors.rs`** (56 lines) - Error handling utilities
390+
- **`utils.rs`** (154 lines) - Utility functions (datetime, regex, validation)
391+
- **`encode.rs`** (1,545 lines) - BSON encoding functions
392+
- **`decode.rs`** (1,141 lines) - BSON decoding functions
393+
394+
This modular structure improves:
395+
- Code organization and maintainability
396+
- Compilation times (parallel module compilation)
397+
- Code navigation and testing
398+
- Clear separation of concerns
399+
330400
## Conclusion
331401

332402
The Rust extension demonstrates that:
333-
1.**Rust can provide a complete, production-ready BSON implementation**
334-
2. **100% compatibility with existing tests and APIs is achievable**
403+
1.**Rust can provide basic BSON encoding/decoding functionality**
404+
2. **Complete feature parity with C extension is not achieved** (~85 tests skipped)
335405
3.**Performance parity with C requires bypassing the `bson` crate**
336406
4.**The engineering effort may not justify the benefits**
337407

338408
### Recommendation
339409

340-
The Rust extension is **production-ready** from a correctness standpoint but **not recommended** for performance-critical applications. The C extension remains the better choice for performance.
410+
⚠️ **NOT PRODUCTION READY** - The Rust extension is **experimental** and has significant limitations:
411+
412+
**Missing Features:**
413+
- Custom type encoders (`TypeEncoder`, `TypeRegistry`, `FallbackEncoder`)
414+
- RawBSONDocument codec options
415+
- Some DBRef edge cases
416+
- Complete type checking support
417+
418+
**Performance Issues:**
419+
- ~5x slower than C extension (0.21x performance)
420+
- Even with direct byte-writing optimizations, still ~2.3x slower (0.43x performance)
341421

342422
**Use Cases for Rust Extension:**
343-
- Platforms where C compilation is difficult (e.g., WebAssembly)
344-
- Development environments without C toolchain
345-
- Testing and validation purposes
423+
- **Experimental/research purposes only**
424+
- Testing Rust-Python interop with PyO3
425+
- Platforms where C compilation is difficult (with caveats about missing features)
346426
- Future exploration if `bson` crate performance improves
347427

428+
**For production use, the C extension (`_cbson`) is strongly recommended.**
429+
348430
For more details, see:
349431
- [PYTHON-5683 JIRA ticket](https://jira.mongodb.org/browse/PYTHON-5683)
350432
- [PR #2695](https://github.com/mongodb/mongo-python-driver/pull/2695)

bson/_rbson/src/lib.rs

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,28 @@
1414

1515
//! Rust implementation of BSON encoding/decoding functions
1616
//!
17-
//! This module provides the same interface as the C extension (bson._cbson)
18-
//! but implemented in Rust using PyO3 and the bson library.
17+
//! ⚠️ **NOT PRODUCTION READY** - Experimental implementation with incomplete features.
18+
//!
19+
//! This module provides a **partial implementation** of the C extension (bson._cbson)
20+
//! interface, implemented in Rust using PyO3 and the bson library.
21+
//!
22+
//! # Implementation Status
23+
//!
24+
//! - ✅ Core BSON encoding/decoding: 86/88 tests passing
25+
//! - ❌ Custom type encoders: NOT IMPLEMENTED (~85 tests skipped)
26+
//! - ❌ RawBSONDocument: NOT IMPLEMENTED
27+
//! - ❌ Performance: ~5x slower than C extension
1928
//!
2029
//! # Implementation History
2130
//!
2231
//! This implementation was developed as part of PYTHON-5683 to investigate
2332
//! using Rust as an alternative to C for Python extension modules.
2433
//!
2534
//! See PR #2695 for the complete implementation history, including:
26-
//! - Initial implementation with 100% test compatibility
35+
//! - Initial implementation with core BSON functionality
2736
//! - Performance optimizations (type caching, fast paths, direct conversions)
28-
//! - Architectural analysis comparing Rust vs C extension approaches
37+
//! - Modular refactoring (split into 6 modules)
38+
//! - Test skip markers for unimplemented features
2939
//!
3040
//! # Performance
3141
//!
@@ -59,7 +69,7 @@ fn _test_rust_extension(py: Python) -> PyResult<PyObject> {
5969
let result = PyDict::new(py);
6070
result.set_item("implementation", "rust")?;
6171
result.set_item("version", "0.1.0")?;
62-
result.set_item("status", "production-ready")?;
72+
result.set_item("status", "experimental")?;
6373
result.set_item("pyo3_version", env!("CARGO_PKG_VERSION"))?;
6474
Ok(result.into())
6575
}

0 commit comments

Comments
 (0)