A comprehensive Python SDK for MatrixOne that provides SQLAlchemy-like interface for database operations, vector search, fulltext search, snapshot management, PITR, restore operations, table cloning, and mo-ctl integration.
📚 Complete Documentation | 🚀 Quick Start Guide | 🧠 Vector Search & IVF Monitoring | 📖 API Reference
Note: This SDK implementation is generated and optimized through AI-assisted development, leveraging advanced code generation techniques to ensure comprehensive API coverage, type safety, and modern Python best practices.
- 🚀 High Performance: Optimized for MatrixOne database operations with connection pooling
- 🔄 Async Support: Full async/await support with AsyncClient for non-blocking operations
- 🧠 Vector Search: Advanced vector similarity search with HNSW and IVF indexing algorithms
- Support for f32 and f64 precision vectors
- Multiple distance metrics (L2, Cosine, Inner Product)
- ⭐ IVF Index Health Monitoring with
get_ivf_stats()- Critical for production! - ⭐ IVF LIMIT BY RANK - Fine-grained ranking control (pre/post/force modes)
- Configurable index parameters for performance tuning
- 🔍 Fulltext Search: Powerful fulltext indexing and search with BM25 and TF-IDF algorithms
- Natural language and boolean search modes
- Multi-column fulltext indexes
- Relevance scoring and ranking
- 📊 Metadata Analysis: Comprehensive table and column metadata analysis with statistics
- Table size and row count statistics
- Column data distribution analysis
- Index usage metrics
- 🔍 Secondary Index Verification: Verify consistency of secondary indexes with main table
- Get all secondary index table names
- Get specific index table by index name
- Verify row counts across main table and all indexes in single query
- 📂 Stage Management: Comprehensive external stage operations for centralized data storage
- Create and manage stages (file system, S3, cloud storage)
- Load data from stages with automatic path resolution
- Stage-scoped load operations for convenience
- Transaction support for atomic stage operations
- 📥 Data Loading: Pandas-style bulk data loading with intuitive interface
read_csv(),read_json(),read_parquet()methods (pandas-compatible API)- Load from local files or external stages (
stage://protocol) - Pandas-compatible parameters (
sep,quotechar,skiprows,names,encoding) - Support for inline data loading and compression
- Transaction-aware loading for data consistency
- 📤 Data Export: Pandas-style data export with intuitive interface
to_csv()andto_jsonl()methods (pandas-compatible API)- Export to local files or external stages (
stage://protocol) - Support for raw SQL, SQLAlchemy select(), and MatrixOne queries
- Customizable CSV options (sep, quotechar, lineterminator)
- Transaction-aware exports for data consistency
- 📸 Snapshot Management: Create and manage database snapshots at multiple levels
- ⏰ Point-in-Time Recovery: PITR functionality for precise data recovery
- 🔄 Table Cloning: Clone databases and tables efficiently with data replication
- ⭐ SQLAlchemy-style statement builders (
clone_table(),clone_database()) - Snapshot-based point-in-time cloning
- Cross-tenant cloning with TO ACCOUNT support
- IF NOT EXISTS for idempotent operations
- ⭐ SQLAlchemy-style statement builders (
- 🌿 Branch Management: Git-style version control for databases and tables
- Create isolated branches for development and testing
- Compare differences between branches with count-only optimization
- Merge branches with conflict resolution strategies
- Point-in-time branching from snapshots
- ⭐ SQLAlchemy-style statement builders (
create_table_branch(),diff_table_branch(), etc.) - Advanced DIFF output: COUNT, LIMIT, FILE export, snapshot on both sides
- Cross-tenant branching with TO ACCOUNT support
- 👥 Account Management: Comprehensive user, role, and permission management
- 📊 Pub/Sub: Real-time publication and subscription support
- 🔧 Version Management: Automatic backend version detection and compatibility checking
- 🛡️ Type Safety: Full type hints support with comprehensive documentation
- 📚 SQLAlchemy Integration: Seamless SQLAlchemy integration with enhanced ORM features
- 🔗 Enhanced Query Building: Advanced query building with logical operations (logical_and, logical_or, logical_not)
- 🪝 Connection Hooks: Pre/post connection hooks for custom initialization logic
- 🛠️ mo-diag CLI Tool: Interactive diagnostic tool for MatrixOne database maintenance
- Index health monitoring and verification
- IVF/HNSW vector index status inspection
- Table statistics and metadata analysis
- Interactive shell with Tab completion and command history
- Non-interactive mode for scripting and automation
- Batch operations on tables and indexes
🚨 CRITICAL: Always use lowercase with underscores (snake_case) for column names!
MatrixOne does not support SQL standard double-quoted identifiers in queries, which causes issues with camelCase column names when using SQLAlchemy ORM.
# ❌ DON'T: CamelCase column names (will fail in SELECT queries)
class User(Base):
userName = Column(String(50)) # CREATE succeeds, SELECT fails!
userId = Column(Integer) # Will cause SQL syntax errors
# ✅ DO: Use lowercase with underscores (snake_case)
class User(Base):
user_name = Column(String(50)) # Works perfectly
user_id = Column(Integer) # All operations succeedWhy this matters:
- ✅ CREATE TABLE works with both styles (uses backticks)
- ✅ INSERT works with both styles
- ❌ SELECT fails with camelCase (uses double quotes, not supported by MatrixOne)
Example of the problem:
# CamelCase generates: SELECT "userName" FROM user ❌ Fails!
# snake_case generates: SELECT user_name FROM user ✅ Works!pip install matrixone-python-sdkpip install \
--index-url https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple/ \
matrixone-python-sdkNote: The --extra-index-url is required to install dependencies (PyMySQL, SQLAlchemy, etc.) from the official PyPI.
git clone https://github.com/matrixorigin/matrixone.git
cd matrixone/clients/python
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Quick setup with Makefile (includes all dev dependencies)
make dev-setup
# Or manual setup
pip install -e '.[dev]' # Includes pyarrow, Faker, pytest, etc.git clone https://github.com/matrixorigin/matrixone.git
cd matrixone/clients/python
# Create conda environment
conda create -n matrixone-dev python=3.10
conda activate matrixone-dev
# Quick setup with Makefile
make dev-setup
# Or manual setup
pip install -e '.[dev]'git clone https://github.com/matrixorigin/matrixone.git
cd matrixone/clients/python
# Quick setup with Makefile
make dev-setup
# Or manual setup
pip install -e '.[dev]'# Install test dependencies (includes pyarrow for Parquet support)
pip install -e '.[test]'
# Or use requirements files
pip install -r requirements-sqlalchemy20.txt # SQLAlchemy 2.x
pip install -r requirements-sqlalchemy14.txt # SQLAlchemy 1.4.xThe SDK includes comprehensive LOAD DATA functionality with support for:
- CSV/TSV files: Basic delimited file loading
- JSONLINE files: JSON Lines format with object/array structures
- Parquet files: Columnar format loading (requires
pyarrow) - Inline data: Direct string data loading
- Stage loading: Loading from named external stages
Dependencies for Load Data:
# Required for Parquet file support
pip install pyarrow>=10.0.0
# For generating test data
pip install Faker>=10.0.0
# Or install everything at once
pip install -e '.[dev]' # Includes pyarrow, Faker, pytest, etc.from matrixone import Client
# Create and connect to MatrixOne
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Execute queries
result = client.execute("SELECT 1 as test")
print(result.fetchall())
# Get backend version (auto-detected)
version = client.get_backend_version()
print(f"MatrixOne version: {version}")
client.disconnect()📝 Connection Parameters
The
connect()method requires keyword arguments (not positional):
database- Required, no default valuehost- Default:'localhost'port- Default:6001user- Default:'root'password- Default:'111'Minimal connection (uses all defaults):
client.connect(database='test')By default, all features (IVF, HNSW, fulltext) are automatically enabled via
on_connect=[ConnectionAction.ENABLE_ALL].
Use client.session() for atomic transactions. All operations within a session succeed or fail together:
from matrixone import Client
from sqlalchemy import select, insert, update, delete
client = Client()
client.connect(database='test')
# Basic transaction with automatic commit/rollback
with client.session() as session:
# All operations are atomic
session.execute(insert(User).values(name='Alice', age=30))
session.execute(update(User).where(User.age < 18).values(status='minor'))
# Query within transaction
stmt = select(User).where(User.age > 25)
result = session.execute(stmt)
users = result.scalars().all()
# Commits automatically on success, rolls back on error
client.disconnect()Key Benefits:
- ✅ Atomic operations - all succeed or fail together
- ✅ Automatic rollback on errors
- ✅ Access to all MatrixOne managers (snapshots, clones, load_data, etc.)
- ✅ Full SQLAlchemy ORM support
Load bulk data from CSV, JSON, or Parquet files with pandas-compatible API:
from matrixone import Client
from matrixone.orm import declarative_base, Column, Integer, String
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100))
email = Column(String(255))
client = Client()
client.connect(database='test')
# Create table
client.create_table(User)
# Basic CSV load (pandas-style)
client.load_data.read_csv('users.csv', table=User)
# CSV with header (pandas-style)
client.load_data.read_csv('users.csv', table=User, skiprows=1)
# Custom separator (pandas-style)
client.load_data.read_csv('users.txt', table=User, sep='|')
# Tab-separated (pandas-style)
client.load_data.read_csv('users.tsv', table=User, sep='\t')
# JSON Lines (pandas-style)
client.load_data.read_json('events.jsonl', table='events', lines=True, orient='records')
# Parquet (pandas-style)
client.load_data.read_parquet('data.parquet', table=User)
# Load from stage
client.load_data.read_csv('stage://s3_stage/users.csv', table=User)
client.load_data.read_csv_stage('s3_stage', 'users.csv', table=User)
# Inline data
csv_data = "1,Alice,alice@example.com\\n2,Bob,bob@example.com\\n"
client.load_data.read_csv(csv_data, table=User, inline=True)
client.disconnect()Key Features:
- ✅ Pandas-compatible API (
read_csv(),read_json(),read_parquet()) - ✅ Pandas-style parameters (
sep,quotechar,skiprows,names,encoding) - ✅ Load from local files or external stages (
stage://protocol) - ✅ Support for compression, parallel loading, and inline data
Export query results to CSV or JSONL with pandas-compatible API:
from matrixone import Client
from sqlalchemy import select
client = Client()
client.connect(database='test')
# Basic CSV export (pandas-style)
client.export.to_csv('/tmp/users.csv', "SELECT * FROM users")
# With custom separator
client.export.to_csv('/tmp/users.tsv', "SELECT * FROM users", sep='\t')
# Export with SQLAlchemy
stmt = select(User).where(User.age > 25)
client.export.to_csv('/tmp/adults.csv', stmt, sep='|')
# Export to JSONL
client.export.to_jsonl('/tmp/users.jsonl', "SELECT * FROM users")
# Export to external stage (using stage:// protocol)
client.export.to_csv('stage://s3_stage/backup.csv', stmt)
# Export to external stage (using convenience method)
client.export.to_csv_stage('s3_stage', 'backup2.csv', stmt)
# Export within transaction
with client.session() as session:
stmt = select(Product).where(Product.stock > 0)
session.export.to_csv('/tmp/in_stock.csv', stmt)
client.disconnect()Key Features:
- ✅ Pandas-compatible API (
to_csv(),to_jsonl()) - ✅ Convenience methods for stage exports (
to_csv_stage(),to_jsonl_stage()) - ✅ Support for raw SQL, SQLAlchemy, and MatrixOne queries
- ✅ Export to local files or external stages (
stage://protocol) - ✅ Customizable CSV options (sep, quotechar, lineterminator)
If you have existing SQLAlchemy code, you can wrap your sessions with MatrixOne features without refactoring:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from matrixone import Client
from matrixone.session import Session as MatrixOneSession
# Your existing SQLAlchemy setup
engine = create_engine('mysql+pymysql://root:111@localhost:6001/test')
SessionFactory = sessionmaker(bind=engine)
sqlalchemy_session = SessionFactory()
# Create MatrixOne client
mo_client = Client()
mo_client.connect(host='localhost', port=6001, user='root', password='111', database='test')
# Wrap your existing session with MatrixOne features
mo_session = MatrixOneSession(
client=mo_client,
wrap_session=sqlalchemy_session
)
try:
# Your existing SQLAlchemy operations still work
result = mo_session.execute("SELECT * FROM users")
# Now you can also use MatrixOne-specific features
mo_session.stage.create_s3('backup_stage', bucket='my-backups', path='')
mo_session.snapshots.create('daily_backup', level='database')
mo_session.load_data.read_csv('/data/users.csv', table='users')
mo_session.commit()
finally:
mo_session.close()Use Cases:
- 🔄 Gradual migration from pure SQLAlchemy to MatrixOne
- 🏢 Adding MatrixOne features to existing enterprise applications
- 📦 Legacy code modernization without complete refactoring
- 🔧 Testing MatrixOne features alongside existing code
The SDK provides seamless SQLAlchemy integration with ORM-style operations:
from matrixone import Client
from matrixone.orm import Base, Column, Integer, String, select, insert, update, delete
# Define ORM models
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100))
email = Column(String(255))
age = Column(Integer)
client = Client()
client.connect(database='test')
# Create table from model
client.create_table(User)
# ORM-style INSERT
stmt = insert(User).values(name='John', email='john@example.com', age=30)
client.execute(stmt)
# ORM-style SELECT
stmt = select(User).where(User.age > 25)
result = client.execute(stmt)
for user in result.scalars():
print(f"User: {user.name}, Age: {user.age}")
# ORM-style UPDATE
stmt = update(User).where(User.id == 1).values(email='newemail@example.com')
client.execute(stmt)
# ORM-style DELETE
stmt = delete(User).where(User.age < 18)
client.execute(stmt)
client.disconnect()Recommended Practices:
- ✅ Use
session()for multi-statement transactions - ✅ Use ORM-style statements (
select,insert,update,delete) - ✅ Use
client.execute()for single-statement operations - ✅ Prefer SQLAlchemy statements over raw SQL strings
Full async/await support with AsyncClient and async sessions:
import asyncio
from matrixone import AsyncClient
from sqlalchemy import select, insert, update
async def main():
client = AsyncClient()
await client.connect(database='test')
# Basic async query
result = await client.execute(select(User).where(User.age > 25))
print(result.fetchall())
# Async transaction
async with client.session() as session:
# All operations are atomic
await session.execute(insert(User).values(name='Alice', age=30))
await session.execute(update(User).where(User.age < 18).values(status='minor'))
# Concurrent queries with asyncio.gather
user_result, order_result = await asyncio.gather(
session.execute(select(User)),
session.execute(select(Order))
)
# Commits automatically
await client.disconnect()
asyncio.run(main())Async Benefits:
- ✅ Non-blocking I/O operations
- ✅ Concurrent execution with
asyncio.gather() - ✅ Perfect for FastAPI, aiohttp, and async frameworks
- ✅ Same transaction guarantees as sync version
Create and manage database snapshots. Use session() for atomic snapshot + clone operations:
from matrixone import SnapshotLevel
# Single snapshot operation
snapshot = client.snapshots.create(
name='my_snapshot',
level=SnapshotLevel.DATABASE,
database='production'
)
# List snapshots
snapshots = client.snapshots.list()
for snap in snapshots:
print(f"Snapshot: {snap.name}, Created: {snap.created_at}")
# Atomic snapshot + clone in transaction
with client.session() as session:
# Create snapshot
snapshot = session.snapshots.create(
name='daily_backup',
level=SnapshotLevel.DATABASE,
database='production'
)
# Clone from snapshot atomically
session.clone.clone_database(
target_db='production_copy',
source_db='production',
snapshot_name='daily_backup'
)
# Both operations commit togetherfrom matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import Column, BigInteger, String, Text
from matrixone.orm import declarative_base
import numpy as np
# Define vector table model
Base = declarative_base()
class Document(Base):
__tablename__ = 'documents'
# IMPORTANT: HNSW index requires BigInteger primary key
id = Column(BigInteger, primary_key=True, autoincrement=True)
title = Column(String(200))
content = Column(Text)
embedding = create_vector_column(384, precision='f32') # 384-dim vector
# Create table using client API
client.create_table(Document)
# Enable and create HNSW index
client.vector_ops.enable_hnsw()
client.vector_ops.create_hnsw(
'documents', # table name - positional argument
name='idx_embedding',
column='embedding',
m=16,
ef_construction=200
)
# Insert document with vector using client API
client.insert(Document, {
'title': 'AI Research Paper',
'content': 'Advanced machine learning techniques...',
'embedding': np.random.rand(384).tolist()
})
# Vector similarity search
query_vector = np.random.rand(384).tolist()
results = client.vector_ops.similarity_search(
'documents', # table name - positional argument
vector_column='embedding',
query_vector=query_vector,
limit=5,
distance_type='cosine'
)
# ⭐ CRITICAL: Monitor IVF index health for production
stats = client.vector_ops.get_ivf_stats('documents', 'embedding')
counts = stats['distribution']['centroid_count']
balance_ratio = max(counts) / min(counts) if min(counts) > 0 else float('inf')
print(f"Index Health: {len(counts)} centroids, balance ratio: {balance_ratio:.2f}")
if balance_ratio > 2.5:
print("⚠️ Index needs rebuilding for optimal performance!")# Enable and create fulltext index using client API
client.fulltext_index.enable_fulltext()
client.fulltext_index.create(
'documents', # table name - positional argument
name='ftidx_content',
columns=['title', 'content']
)
# Boolean search with encourage (like natural language search)
from matrixone.sqlalchemy_ext.fulltext_search import boolean_match
results = client.query(
Document.title,
Document.content,
boolean_match('title', 'content').encourage('machine learning techniques')
).execute()
# Boolean search with must/should/must_not operators
results = client.query(
Document.title,
Document.content,
boolean_match('title', 'content')
.must('machine')
.must('learning')
.must_not('basic')
).execute()Load data from external sources using stages. Use session() for atomic multi-file loading:
from matrixone.orm import Base, Column, Integer, String
# Define ORM model
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100))
email = Column(String(255))
# Create S3 stage using simple interface
client.stage.create_s3(
name='data_stage',
bucket='my-bucket',
path='data/',
aws_key_id='your_key',
aws_secret_key='your_secret',
region='us-east-1'
)
# Load data from stage using ORM model (pandas-style)
client.load_data.read_csv_stage('data_stage', 'users.csv', table=User)
# Atomic multi-file loading in transaction
with client.session() as session:
# Create local stage
session.stage.create_local('import_stage', '/data/imports/')
# Load multiple files atomically (pandas-style)
session.load_data.read_csv('/data/users.csv', table=User)
session.load_data.read_csv('/data/orders.csv', table=Order)
# All loads commit togetherStage Benefits:
- ✅ Centralized data source configuration
- ✅ Support for S3, local filesystem, and cloud storage
- ✅ Automatic path resolution
- ✅ Transaction support for atomic operations
# Get table metadata with statistics
metadata_rows = client.metadata.scan(
dbname='test',
tablename='documents',
columns='*' # Get structured MetadataRow objects
)
for row in metadata_rows:
print(f"Column: {row.col_name}")
print(f" Nulls: {row.null_cnt}")
print(f" Rows: {row.rows_cnt}")
print(f" Size: {row.origin_size}")
# Get table statistics
brief_stats = client.metadata.get_table_brief_stats(
dbname='test',
tablename='documents'
)
table_stats = brief_stats['documents']
print(f"Total rows: {table_stats['row_cnt']}")
print(f"Original size: {table_stats['original_size']} bytes")
print(f"Compressed size: {table_stats['compress_size']} bytes")# Get all secondary index tables
index_tables = client.get_secondary_index_tables('my_table')
print(f"Found {len(index_tables)} secondary indexes")
# Get specific index by name
physical_table = client.get_secondary_index_table_by_name('my_table', 'idx_name')
print(f"Index 'idx_name' -> {physical_table}")
# Verify all indexes have same count as main table
try:
count = client.verify_table_index_counts('my_table')
print(f"✓ Verification passed! Row count: {count}")
except ValueError as e:
print(f"✗ Index mismatch detected:")
print(e)
# Error includes all count details for debugging# Check if feature is available
if client.is_feature_available('snapshot_creation'):
snapshot = client.snapshots.create('my_snapshot', 'cluster')
else:
hint = client.get_version_hint('snapshot_creation')
print(f"Feature not available: {hint}")
# Check version compatibility
if client.check_version_compatibility('3.0.0', '>='):
print("Backend supports 3.0.0+ features")The SDK automatically detects MatrixOne backend versions and handles compatibility:
- Development Version:
8.0.30-MatrixOne-v→999.0.0(highest priority) - Release Version:
8.0.30-MatrixOne-v3.0.0→3.0.0 - Legacy Format:
MatrixOne 3.0.1→3.0.1
# Check if running development version
if client.is_development_version():
print("Running development version - all features available")
else:
print(f"Running release version: {client.get_backend_version()}")The mo-diag command-line tool provides an interactive shell for diagnosing and maintaining MatrixOne databases, with a special focus on vector indexes, secondary indexes, and table statistics.
After installing the SDK, mo-diag is automatically available as a command:
pip install matrixone-python-sdk
mo-diag --helpLaunch the interactive shell to execute multiple diagnostic commands:
# Connect to default localhost
mo-diag --database test
# Connect to remote database
mo-diag --host 192.168.1.100 --port 6001 --user admin --password secret --database productionInteractive Features:
- 🔍 Tab Completion: Press
Tabto auto-complete commands, table names, and database names - ⬆️⬇️ Command History: Use arrow keys to browse command history
- 🔎 History Search: Press
Ctrl+Rto search command history - 🎨 Colored Output: Clear visual feedback with syntax highlighting
- 💾 Persistent History: Command history saved to
~/.mo_diag_history
Execute single commands directly for scripting and automation:
# Check IVF index status
mo-diag -d test -c "show_ivf_status"
# Get detailed table statistics
mo-diag -d test -c "show_table_stats my_table -a -d"
# Execute SQL query
mo-diag -d test -c "sql SELECT COUNT(*) FROM my_table"
# Flush table and all indexes
mo-diag -d test -c "flush_table my_table"Manage Change Data Capture directly from the shell or scripts. The interactive
commands (cdc_tasks, cdc_task, cdc_create, cdc_drop) now have matching
non-interactive helpers via mo-diag cdc.
# Summaries (same as running cdc_tasks inside the shell)
mo-diag cdc show
# Inspect a task with watermark threshold control
mo-diag cdc show nightly_sync --details --threshold=5m
# Launch the guided creator with a forced mode
mo-diag cdc create --table-level
# Drop a task with explicit confirmation override
mo-diag cdc drop nightly_sync --forceshow_indexes <table> [database]
- Display all indexes for a table including IVF, HNSW, Fulltext, and regular indexes
- Shows physical table names, index types, and statistics
- Includes object counts, row counts, and sizes for vector/fulltext indexes
MO-DIAG[test]> show_indexes ivf_health_demo_docs
📊 Secondary Indexes for 'test.ivf_health_demo_docs'
*************************** 1. row ***************************
Index Name: idx_embedding_ivf_v2
Algorithm: ivfflat
Table Type: metadata
Physical Table: __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581
Columns: embedding
Statistics:
- Objects: 1
- Rows: 7
- Compressed Size: 940 B
- Original Size: 1.98 KB
*************************** 2. row ***************************
Index Name: idx_embedding_ivf_v2
Algorithm: ivfflat
Table Type: centroids
Physical Table: __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1
Columns: embedding
Statistics:
- Objects: 1
- Rows: 17
- Compressed Size: 3.09 KB
- Original Size: 6.83 KB
*************************** 3. row ***************************
Index Name: idx_embedding_ivf_v2
Algorithm: ivfflat
Table Type: entries
Physical Table: __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a
Columns: embedding
Statistics:
- Objects: 1
- Rows: 1,000
- Compressed Size: 156.34 KB
- Original Size: 176.07 KB
Total: 3 index tables (1 ivfflat with 3 physical tables)
Note: IVF indexes have 3 physical tables (metadata, centroids, entries)
show_all_indexes [database]
- Health report for all tables with secondary indexes in the database
- Row count consistency checks
- IVF/HNSW/Fulltext index status
MO-DIAG[test]> show_all_indexes
📊 Index Health Report for Database 'test':
========================================================================================================================
✓ HEALTHY (3 tables)
------------------------------------------------------------------------------------------------------------------------
Table Name | Indexes | Row Count | Notes
------------------------------------------------------------------------------------------------------------------------
cms_all_content_chunk_info | 3 | ✓ 32,712 rows | -
demo_mixed_indexes | 4 | ✓ 20 rows | -
ivf_health_demo_docs | 1 | 300 rows | IVF: 17 centroids, 300 vectors
========================================================================================================================
Summary:
✓ 3 healthy tables
Total: 3 tables with indexes
💡 Tip: Use 'verify_counts <table>' or 'show_ivf_status' for detailed diagnostics
verify_counts <table> [database]
- Verify row count consistency between main table and all its secondary indexes
- Highlights any mismatches
MO-DIAG[test]> verify_counts my_table
📊 Row Count Verification for 'test.my_table'
════════════════════════════════════════════════════════════════════════════
Main table: 10,000 rows
────────────────────────────────────────────────────────────────────────────
✓ __mo_index_secondary_xxx: 10,000 rows
✓ __mo_index_unique_yyy: 10,000 rows
════════════════════════════════════════════════════════════════════════════
✅ PASSED: All index tables match (10,000 rows)
show_ivf_status [database] [-v] [-t table]
- Display IVF index building status and centroid distribution
-v: Verbose mode with detailed centroid information-t <table>: Filter by specific table
MO-DIAG[test]> show_ivf_status
📊 IVF Index Status in 'test':
════════════════════════════════════════════════════════════════════════════
Table | Index | Column | Centroids | Vectors | Balance | Status
────────────────────────────────────────────────────────────────────────────
ivf_health_demo_docs | idx_embedding_ivf_v2 | embedding | 17 | 1,000 | 2.35 | ✓ active
show_table_stats <table> [database] [-t] [-a] [-d]
- Display table metadata and statistics
-t: Include tombstone statistics-a: Include all indexes (hierarchical view with -d)-d: Show detailed object lists
MO-DIAG[test]> show_table_stats ivf_health_demo_docs -a -d
📊 Detailed Table Statistics for 'test.ivf_health_demo_docs':
======================================================================================================================================================
Table: ivf_health_demo_docs
Objects: 1 | Rows: 1,000 | Null: 0 | Original: 176.03 KB | Compressed: 156.24 KB
Objects:
Object Name | Rows | Null Cnt | Original Size | Compressed Size
----------------------------------------------------------------------------------------------------------------------------------------------------
0199e729-642e-71e0-b338-67c4980ee294_00000 | 1000 | 0 | 176.03 KB | 156.24 KB
Index: idx_embedding_ivf_v2
└─ (metadata): __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581:272851
Objects: 1 | Rows: 7 | Null: 0 | Original: 1.98 KB | Compressed: 940 B
Objects:
Object Name | Rows | Null Cnt | Original Size | Compressed Size
----------------------------------------------------------------------------------------------------------------------------------------------------
0199e729-642a-7d36-ac37-0ae17325f7ec_00000 | 7 | 0 | 1.98 KB | 940 B
└─ (centroids): __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1:272852
Objects: 1 | Rows: 17 | Null: 0 | Original: 6.83 KB | Compressed: 3.09 KB
Objects:
Object Name | Rows | Null Cnt | Original Size | Compressed Size
----------------------------------------------------------------------------------------------------------------------------------------------------
0199e729-6431-794d-9c13-d10f2e4a59e7_00000 | 17 | 0 | 6.83 KB | 3.09 KB
└─ (entries): __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a:272853
Objects: 1 | Rows: 1,000 | Null: 0 | Original: 696.11 KB | Compressed: 626.37 KB
Objects:
Object Name | Rows | Null Cnt | Original Size | Compressed Size
----------------------------------------------------------------------------------------------------------------------------------------------------
0199e729-6438-7db1-bcb6-7b74c59b0aee_00000 | 1000 | 0 | 696.11 KB | 626.37 KB
Note:
- Hierarchical structure: Table → Index → Physical Tables → Objects
- IVF index has 3 physical tables: metadata (7 rows), centroids (17 rows), entries (1,000 rows)
- Physical table name format:
table_name:table_id - Use
-tflag to include tombstone statistics for deleted/updated data
flush_table <table> [database]
- Flush main table and all its secondary index physical tables
- Includes IVF metadata/centroids/entries, HNSW, Fulltext, and regular indexes
- Requires sys user privileges
MO-DIAG[test]> flush_table ivf_health_demo_docs
🔄 Flushing table: test.ivf_health_demo_docs
✓ Main table flushed: ivf_health_demo_docs
📋 Found 3 index physical tables
Index: idx_embedding_ivf_v2
✓ metadata: __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581
✓ centroids: __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1
✓ entries: __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a
📊 Summary:
Main table: ✓ flushed
Index tables: 3/3 flushed successfully
Note:
- Automatically discovers and flushes all index physical tables
- For IVF indexes: flushes metadata, centroids, and entries tables
- Requires sys user privileges
- Recommended before backups or after bulk operations
tables [database]
- List all tables in current or specified database
databases
- List all databases (highlights current database)
use <database>
- Switch to a different database
sql <SQL statement>
- Execute arbitrary SQL query
MO-DIAG[test]> sql SELECT COUNT(*) FROM my_table
col0
----
10000
1 row(s) returned
history [n | -c]
- Show last n commands (default: 20)
-c: Clear command history
help [command]
- Show help for all commands or specific command
exit / quit
- Exit the interactive shell
usage: mo-diag [-h] [--host HOST] [--port PORT] [--user USER]
[--password PASSWORD] [--database DATABASE]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--command COMMAND]
options:
--host HOST Database host (default: localhost)
--port PORT Database port (default: 6001)
--user USER Database user (default: root)
--password PASSWORD Database password (default: 111)
--database DATABASE Database name (optional)
-d DATABASE Short form of --database
--log-level LEVEL Logging level (default: ERROR)
--command COMMAND Execute single command and exit
-c COMMAND Short form of --command
# Quick check on index status
mo-diag -d production -c "show_ivf_status -v"
# Detailed index inspection with physical table stats
mo-diag -d production -c "show_indexes my_vector_table"
# Verify centroid distribution balance
mo-diag -d production -c "show_ivf_status -t my_vector_table -v"# Check row consistency
mo-diag -d test -c "verify_counts my_table"
# Get health report for all indexes
mo-diag -d test -c "show_all_indexes"
# Flush to sync if needed
mo-diag -d test -c "flush_table my_table"# Get table statistics with all indexes
mo-diag -d test -c "show_table_stats my_table -a"
# Detailed object-level analysis
mo-diag -d test -c "show_table_stats my_table -a -d"
# Include tombstone statistics
mo-diag -d test -c "show_table_stats my_table -a -t -d"#!/bin/bash
# daily_index_check.sh
DATABASES=("prod_db1" "prod_db2" "prod_db3")
for db in "${DATABASES[@]}"; do
echo "Checking $db..."
mo-diag -d "$db" -c "show_all_indexes" >> /var/log/mo_diag_daily.log
mo-diag -d "$db" -c "show_ivf_status" >> /var/log/mo_diag_daily.log
done- Regular Health Checks: Run
show_all_indexesdaily to catch index issues early - Monitor IVF Balance: Use
show_ivf_status -vto ensure even centroid distribution - Before Major Operations: Always
verify_countsbefore bulk updates or migrations - Production Debugging: Use non-interactive mode for logging and monitoring
- Tab Completion: Leverage Tab completion to avoid typos in table/database names
- Command History: Use
Ctrl+Rto quickly find and re-execute previous diagnostic commands - Flush Regularly: If you notice index count mismatches,
flush_tablecan help sync
Issue: Tab completion not working
- Solution: Ensure
prompt_toolkit>=3.0.0is installed:pip install prompt_toolkit
Issue: "Unknown database" error
- Solution: Create the database first:
mo-diag -d test -c "sql CREATE DATABASE IF NOT EXISTS test"
Issue: Permission denied for flush operations
- Solution: Connect as sys user or a user with sufficient privileges
Issue: IVF index shows 0 centroids
- Solution: The index might be empty or still building. Check with
show_table_stats <table> -a
pitr = client.pitr.create_cluster_pitr( name='cluster_pitr', range_value=7, range_unit='d' )
restore_point = datetime(2024, 1, 15, 10, 30, 0) client.pitr.restore_to_time(restore_point)
### Account Management
```python
# Create user
user = client.account.create_user(
username='newuser',
password='password123',
description='New user account'
)
# Create role
role = client.account.create_role(
role_name='analyst',
description='Data analyst role'
)
# Grant privileges
client.account.grant_privilege(
user='newuser',
role='analyst',
privileges=['SELECT', 'INSERT']
)
# Create publication
publication = client.pubsub.create_publication(
name='data_changes',
tables=['users', 'orders'],
description='User and order changes'
)
# Create subscription
subscription = client.pubsub.create_subscription(
name='data_sync',
publication_name='data_changes',
target_tables=['users_backup', 'orders_backup']
)# Create a local file system stage (simplified)
stage = client.stage.create_local('my_data_stage', './data/', comment='Main data import stage')
# Or use the full method with explicit URL
stage = client.stage.create('my_data_stage', 'file:///path/to/data/', comment='Main data import stage')
# Create an S3 stage with simplified credentials
s3_stage = client.stage.create_s3(
'production_stage',
'my-bucket/data/',
aws_key='AKIAIOSFODNN7EXAMPLE',
aws_secret='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
aws_region='us-east-1',
comment='Production data stage'
)
# Or use environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
s3_stage = client.stage.create_s3('production_stage', 'my-bucket/data/')
# List all stages
stages = client.stage.list()
for stage in stages:
print(f"{stage.name}: {stage.url}")
# Load data from stage (Method 1: via client.load_data, pandas-style)
client.load_data.read_csv_stage('my_data_stage', 'users.csv', table=User)
# Load data from stage (Method 2: via stage object - simpler!)
stage = client.stage.get('my_data_stage')
stage.load_csv('users.csv', User)
stage.load_json('events.jsonl', Event)
stage.load_parquet('orders.parq', Order)
# Batch load multiple files
results = stage.load_files({
'users.csv': User,
'events.jsonl': Event,
'orders.parq': Order
})
# Modify stage
client.stage.alter('my_data_stage', comment='Updated comment')
# Drop stage
client.stage.drop('my_data_stage')# Create PITR for cluster
## Configuration
### Connection Parameters
```python
client = Client(
connection_timeout=30,
query_timeout=300,
auto_commit=True,
charset='utf8mb4',
sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'
slow_query_threshold=1.0
)from matrixone import MatrixOneLogger
logger = MatrixOneLogger(
level=logging.INFO,
sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'
slow_query_threshold=1.0,
max_sql_display_length=500
)
client = Client(logger=logger)The SDK provides comprehensive error handling with helpful messages:
from matrixone.exceptions import (
ConnectionError,
QueryError,
VersionError,
SnapshotError
)
try:
snapshot = client.snapshots.create('test', 'cluster')
except VersionError as e:
print(f"Version compatibility error: {e}")
except SnapshotError as e:
print(f"Snapshot operation failed: {e}")Check out the examples/ directory for comprehensive usage examples:
Basic Examples:
example_01_basic_connection.py- Basic database operations and connectivityexample_02_account_management.py- User and role managementexample_03_async_operations.py- Async/await operations with AsyncClientexample_04_transaction_management.py- Transaction handling and rollbackexample_05_snapshot_restore.py- Snapshot creation and restore operationsexample_06_sqlalchemy_integration.py- SQLAlchemy ORM integrationexample_07_advanced_features.py- Advanced SDK featuresexample_08_pubsub_operations.py- Pub/Sub operationsexample_09_logger_integration.py- Logging configuration and integrationexample_10_version_management.py- Version compatibility managementexample_11_matrixone_version_demo.py- MatrixOne version detection
Vector Search Examples:
example_12_vector_basics.py- Vector data types and basic distance functionsexample_13_vector_indexes.py- IVF and HNSW index management and performanceexample_14_vector_search.py- Vector similarity search operationsexample_15_vector_advanced.py- Advanced vector operations and optimization
ORM and Advanced Examples:
example_18_snapshot_orm.py- Snapshot operations with ORMexample_19_sqlalchemy_style_orm.py- SQLAlchemy-style ORM patternsexample_20_sqlalchemy_engine_integration.py- SQLAlchemy engine integrationexample_21_advanced_orm_features.py- Advanced ORM featuresexample_22_unified_sql_builder.py- Unified SQL query builderexample_24_query_update.py- Query and update operationsexample_25_metadata_operations.py- Table metadata analysis and statistics
Load Data Examples:
example_23_load_data_operations.py- Comprehensive LOAD DATA operations (CSV, TSV, JSONLINE, Parquet, Inline, Stage)
Stage Management Examples:
example_26_stage_operations.py- External stage management and data loading from stages
Branch Builder Examples:
example_branch_builder.py- SQLAlchemy-style branch statement builders (create, diff, merge)
Clone Builder Examples:
example_clone_builder.py- SQLAlchemy-style clone statement builders (table, database)
Specialized Examples:
example_connection_hooks.py- Connection hooks for custom initializationexample_dynamic_logging.py- Dynamic logging configurationexample_ivf_stats_complete.py- IVF index statistics and analysis
Run all examples:
make examplesRun specific examples:
make example-basic # Basic connection and database operations
make example-async # Async/await operations
make example-vector # Vector search and similarityRun individual example files:
# Basic examples
python examples/example_01_basic_connection.py
python examples/example_03_async_operations.py
# Vector examples
python examples/example_12_vector_basics.py
python examples/example_13_vector_indexes.py
python examples/example_14_vector_search.py
# Load Data examples
python examples/example_23_load_data_operations.py
# Stage management examples
python examples/example_26_stage_operations.py
# Metadata and advanced examples
python examples/example_25_metadata_operations.py
python examples/example_22_unified_sql_builder.pyYou can override the default connection parameters using environment variables:
# Set custom connection parameters
export MATRIXONE_HOST=localhost
export MATRIXONE_PORT=6001
export MATRIXONE_USER=root
export MATRIXONE_PASSWORD=111
export MATRIXONE_DATABASE=test
# Run examples with custom parameters
make examplesOr set them inline:
MATRIXONE_HOST=localhost MATRIXONE_USER=admin make example-basicAvailable environment variables:
MATRIXONE_HOST: Database host (default: 127.0.0.1)MATRIXONE_PORT: Database port (default: 6001)MATRIXONE_USER: Database username (default: root)MATRIXONE_PASSWORD: Database password (default: 111)MATRIXONE_DATABASE: Database name (default: test)MATRIXONE_CHARSET: Character set (default: utf8mb4)MATRIXONE_CONNECT_TIMEOUT: Connection timeout (default: 30)MATRIXONE_AUTOCOMMIT: Auto-commit mode (default: true)
The MatrixOne Python SDK includes comprehensive testing with support for multiple Python and SQLAlchemy versions.
📖 For detailed testing documentation, see TESTING.md
# Check environment and dependencies
make check-env
# Install test dependencies
make install-sqlalchemy14 # For SQLAlchemy 1.4.x
make install-sqlalchemy20 # For SQLAlchemy 2.0.x
# Run tests
make test-offline # Fast mock-based tests
make test-online # Database integration tests (requires MatrixOne)
make test-sqlalchemy14 # Full test suite with SQLAlchemy 1.4.x
make test-sqlalchemy20 # Full test suite with SQLAlchemy 2.0.x
make test-matrix # Test both SQLAlchemy versions- Offline Tests: Mock-based unit tests (450+ tests, ~3-5 seconds)
- Online Tests: Database integration tests (100+ tests, ~20-30 seconds)
- Matrix Tests: Cross-version compatibility testing (Python 3.10/3.11 + SQLAlchemy 1.4/2.0)
# Check environment requirements
make check-env
# Setup development environment (includes all dev dependencies)
make dev-setup
# Install specific dependency sets
make install-deps-dev # Install development dependencies
make install-deps-test # Install test dependencies
make install-sqlalchemy14 # Install SQLAlchemy 1.4.x dependencies
make install-sqlalchemy20 # Install SQLAlchemy 2.0.x dependencies
# Run all tests
make test
# Run examples
make examplesDependency Management:
make install-deps-dev: Installs all development dependencies includingpyarrow,Faker,pytest, etc.make install-deps-test: Installs testing dependencies includingpyarrow(required for Parquet)make install-sqlalchemy14/20: Installs version-specific SQLAlchemy dependencies
Important: pyarrow>=10.0.0 is required for LOAD DATA Parquet functionality. Install via make dev-setup, make install-deps-dev, or make install-deps-test.
For detailed testing instructions, environment setup, and troubleshooting, see TESTING.md.
Full Documentation on ReadTheDocs ⭐
Complete documentation with examples, API reference, and production best practices.
Getting Started:
- Quick Start: Get started quickly with common use cases
- Installation Guide: Installation and setup instructions
- Configuration Guide: Connection and client configuration
Core Features:
- Vector Guide: Vector search, indexing, and ⭐ IVF health monitoring
- Fulltext Guide: Fulltext search and indexing
- ORM Guide: SQLAlchemy ORM integration and patterns
- Metadata Guide: Table metadata analysis and statistics
Advanced Features:
- Snapshot & Restore: Snapshot and PITR operations
- Clone Guide: Database and table cloning
- Account Guide: User and role management
- Pub/Sub Guide: Publication and subscription operations
- MoCTL Guide: MoCTL integration
Production Guide:
- Best Practices: Production-ready patterns using SDK APIs
- Connection Hooks: Pre/post connection hooks
Reference:
- API Reference: Complete API reference
- Examples: Comprehensive usage examples
- Testing Guide: Testing instructions and environment setup
Generate HTML documentation:
make docs # Generate HTML documentation
make docs-serv # Start documentation server at http://localhost:8000We welcome contributions!
- Fork the repository
- Set up development environment
git clone <your-fork-url> cd matrixone/clients/python # Create virtual environment python -m venv venv source venv/bin/activate # On macOS/Linux # Install development dependencies make dev-setup
- Create a feature branch
git checkout -b feature/your-feature-name
- Make your changes
- Add tests for new functionality
- Run pre-commit checks
make pre-commit # format + lint + test - Submit a pull request
# Always work in virtual environment
source venv/bin/activate
# Before making changes
make test # Ensure all tests pass
# After making changes
make format # Format code
make lint # Check code quality
make test # Run tests
make examples # Test examples
# Before submitting PR
make release-check # Complete pre-release checksThis project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- 📧 Email: contact@matrixorigin.cn
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📖 Documentation:
See CHANGELOG.md for version history and changes.
MatrixOne Python SDK - Making MatrixOne database operations simple and powerful in Python.