Skip to content

Latest commit

 

History

History
1812 lines (1496 loc) · 67.1 KB

File metadata and controls

1812 lines (1496 loc) · 67.1 KB

Fulltext Search Guide

This guide covers fulltext search capabilities in the MatrixOne Python SDK, including fulltext indexing, search modes, and advanced features.

Overview

MatrixOne provides powerful fulltext search capabilities for text analysis and information retrieval:

  • Multiple Algorithms: Support for TF-IDF and BM25 algorithms
  • Search Modes: Natural language and boolean search modes
  • Client Interface: Easy-to-use client methods for fulltext operations
  • ORM Integration: Seamless integration with SQLAlchemy models
  • Multi-language Support: Support for various languages including Chinese
  • JSON Search: Fulltext search within JSON documents

Fulltext Index Setup

Creating Tables with Text Content

from matrixone import Client

# Create client and connect (see configuration_guide for connection options)
client = Client()
client.connect(host="127.0.0.1", port=6001, user="root", password="111", database="test")

# Create table with text content using create_table API
client.create_table("articles", {
    "id": "int",
    "title": "varchar(200)",
    "content": "text",
    "summary": "text",
    "author": "varchar(100)",
    "category": "varchar(50)",
    "tags": "varchar(200)",
    "metadata": "json"
}, primary_key="id")

# Create another table for documents
client.create_table("documents", {
    "id": "int",
    "filename": "varchar(255)",
    "file_content": "text",
    "description": "text",
    "file_type": "varchar(50)"
}, primary_key="id")

Creating Fulltext Indexes

Basic Fulltext Indexes

# Create fulltext index using fulltext_index API
client.fulltext_index.create("articles", name="idx_content", columns="content", algorithm="BM25")
client.fulltext_index.create("articles", name="idx_title", columns="title", algorithm="TF-IDF")
client.fulltext_index.create("documents", name="idx_file_content", columns="file_content", algorithm="BM25")

# Create fulltext index on multiple columns
client.fulltext_index.create("articles", name="idx_title_content", columns=["title", "content"], algorithm="BM25")

# Drop a fulltext index
client.fulltext_index.drop("articles", "idx_title")

Fulltext Indexes with Parsers

MatrixOne supports specialized parsers for different content types:

JSON Parser - For searching within JSON documents:

Note

JSON Parser Key Points:

  • The JSON parser indexes VALUES in JSON documents, not keys
  • Perfect for searching product specifications, user preferences, configuration data
  • Supports nested JSON structures
  • Works with both JSON columns and TEXT columns containing JSON
  • Can be combined with regular SQL filters for powerful queries
from matrixone import Client, FulltextParserType
from matrixone.orm import declarative_base
from sqlalchemy import Column, BigInteger, Text

client = Client()
client.connect(host="127.0.0.1", port=6001, user="root", password="111", database="test")

# Enable fulltext functionality
client.fulltext_index.enable_fulltext()

# Method 1: Using ORM with JSON parser
Base = declarative_base()

class Product(Base):
    __tablename__ = "products"
    id = Column(BigInteger, primary_key=True)
    details = Column(Text)  # Stores JSON as text

    # Define fulltext index with JSON parser in table definition
    __table_args__ = (
        FulltextIndex("ftidx_details", "details", parser=FulltextParserType.JSON),
    )

# Create table with ORM (index is created automatically)
client.create_table(Product)

# Insert JSON data
products = [
    {"id": 1, "details": '{"name": "Laptop", "brand": "Dell", "price": 1200}'},
    {"id": 2, "details": '{"name": "Phone", "brand": "Apple", "price": 800}'},
    {"id": 3, "details": '{"name": "Tablet", "brand": "Samsung", "price": 600}'},
]
client.batch_insert(Product, products)

# Search within JSON content
result = client.query(Product).filter(
    boolean_match(Product.details).must("Dell")
).execute()

for row in result.fetchall():
    print(f"Found: {row.details}")

# Method 2: Create JSON index on existing table
client.execute("CREATE TABLE json_docs (id INT PRIMARY KEY, data TEXT)")
client.execute(
    "CREATE FULLTEXT INDEX ftidx_json ON json_docs (data) WITH PARSER json"
)

# Insert and search JSON data
client.execute(
    "INSERT INTO json_docs VALUES "
    "(1, '{\"title\": \"Python Tutorial\", \"tags\": [\"python\", \"programming\"]}'), "
    "(2, '{\"title\": \"Java Guide\", \"tags\": [\"java\", \"programming\"]}'))"
)

result = client.execute(
    "SELECT * FROM json_docs WHERE MATCH(data) AGAINST('python' IN BOOLEAN MODE)"
)
for row in result.fetchall():
    print(f"ID: {row[0]}, Data: {row[1]}")

NGRAM Parser - For Chinese and other Asian languages:

from matrixone import Client, FulltextParserType
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import FulltextIndex, natural_match, boolean_match
from sqlalchemy import Column, Integer, String, Text

client = Client()
client.connect(host="127.0.0.1", port=6001, user="root", password="111", database="test")

# Enable fulltext functionality
client.fulltext_index.enable_fulltext()

# Method 1: Using ORM with NGRAM parser for Chinese content
Base = declarative_base()

class ChineseArticle(Base):
    __tablename__ = "chinese_articles"
    id = Column(Integer, primary_key=True, autoincrement=True)
    title = Column(String(200))
    body = Column(Text)

    # Define fulltext index with NGRAM parser for Chinese tokenization
    __table_args__ = (
        FulltextIndex("ftidx_chinese", ["title", "body"], parser=FulltextParserType.NGRAM),
    )

# Create table with ORM (index is created automatically)
client.create_table(ChineseArticle)

# Insert Chinese content
articles = [
    {"id": 1, "title": "神雕侠侣 第一回", "body": "越女采莲秋水畔,窄袖轻罗,暗露双金钏"},
    {"id": 2, "title": "神雕侠侣 第二回", "body": "正自发痴,忽听左首屋中传出一人喝道"},
    {"id": 3, "title": "神雕侠侣 第三回", "body": "郭靖在舟中潜运神功,数日间伤势便已痊愈了大半"},
]
client.batch_insert(ChineseArticle, articles)

# Search Chinese content with natural language mode
result = client.query(ChineseArticle).filter(
    natural_match(ChineseArticle.title, ChineseArticle.body, query="神雕侠侣")
).execute()

print(f"Found {len(result.fetchall())} articles about 神雕侠侣")

# Search with boolean mode
result = client.query(ChineseArticle).filter(
    boolean_match(ChineseArticle.title, ChineseArticle.body).must("郭靖")
).execute()

for row in result.fetchall():
    print(f"Title: {row.title}, Body: {row.body[:20]}...")

# Method 2: Create NGRAM index on existing table
client.execute(
    "CREATE TABLE chinese_docs ("
    "id INT PRIMARY KEY, "
    "title VARCHAR(200), "
    "content TEXT"
    ")"
)

client.execute(
    "CREATE FULLTEXT INDEX ftidx_ngram ON chinese_docs (title, content) "
    "WITH PARSER ngram"
)

# Insert and search Chinese content
client.execute(
    "INSERT INTO chinese_docs VALUES "
    "(1, 'MO全文索引示例', '这是一个关于MO全文索引的例子'), "
    "(2, 'ngram解析器', 'ngram解析器允许MO对中文进行分词')"
)

result = client.execute(
    "SELECT * FROM chinese_docs "
    "WHERE MATCH(title, content) AGAINST('全文索引' IN NATURAL LANGUAGE MODE)"
)

for row in result.fetchall():
    print(f"Title: {row[1]}, Content: {row[2]}")

Mixed Content (English + Chinese):

from matrixone import Client, FulltextParserType
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import FulltextIndex, natural_match
from sqlalchemy import Column, Integer, String, Text

client = Client()
client.connect(host="127.0.0.1", port=6001, user="root", password="111", database="test")
client.fulltext_index.enable_fulltext()

Base = declarative_base()

class MixedContent(Base):
    __tablename__ = "mixed_articles"
    id = Column(Integer, primary_key=True, autoincrement=True)
    title = Column(String(255))
    content = Column(Text)

    # NGRAM parser works well for mixed English/Chinese content
    __table_args__ = (
        FulltextIndex("ftidx_mixed", ["title", "content"], parser=FulltextParserType.NGRAM),
    )

client.create_table(MixedContent)

# Insert mixed content
articles = [
    {"id": 1, "title": "MO全文索引示例", "content": "这是关于MO fulltext index的例子"},
    {"id": 2, "title": "Python教程", "content": "Learn Python programming with 中文教程"},
]
client.batch_insert(MixedContent, articles)

# Search for Chinese terms
result = client.query(MixedContent).filter(
    natural_match(MixedContent.title, MixedContent.content, query="全文索引")
).execute()

# Search for English terms
result = client.query(MixedContent).filter(
    natural_match(MixedContent.title, MixedContent.content, query="Python")
).execute()

Inserting Text Data

# Insert articles using insert API
articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence that focuses on algorithms and statistical models. It enables computers to learn and make decisions from data without being explicitly programmed.",
        "summary": "An overview of machine learning concepts and applications",
        "author": "John Doe",
        "category": "Technology",
        "tags": "AI, ML, algorithms",
        "metadata": '{"language": "English", "difficulty": "beginner"}'
    },
    {
        "id": 2,
        "title": "Deep Learning Fundamentals",
        "content": "Deep learning uses neural networks with multiple layers to model and understand complex patterns in data. It has revolutionized fields like computer vision, natural language processing, and speech recognition.",
        "summary": "Understanding deep learning and neural networks",
        "author": "Jane Smith",
        "category": "Technology",
        "tags": "deep learning, neural networks, AI",
        "metadata": '{"language": "English", "difficulty": "intermediate"}'
    }
]

for article in articles:
    client.insert("articles", article)

# Insert documents using batch_insert API
documents = [
    {
        "id": 1,
        "filename": "research_paper.pdf",
        "file_content": "This research paper discusses advanced machine learning techniques and their applications in real-world scenarios.",
        "description": "Academic research paper on ML",
        "file_type": "PDF"
    }
]

client.batch_insert("documents", documents)

Basic Fulltext Search

# Natural language search - automatically handles stopwords, stemming, and relevance scoring
# This mode is ideal for user queries and general search applications
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(natural_match("content", query="machine learning")).execute()
print("Natural language search results:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Boolean search with phrase matching - provides precise control over search terms
# Use phrase() for exact phrase matching, encourage() for boosting relevance
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(
    boolean_match("content").phrase("deep learning").encourage("networks")
).execute()
print("Boolean search results:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Search with relevance scoring - returns a relevance score for ranking results
# Higher scores indicate better matches; useful for search result ranking
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author",
    natural_match("content", query="artificial intelligence").label("relevance")
).execute()
print("Search with relevance scoring:")
for row in result.fetchall():
    print(f"  {row[1]} (Relevance: {row[4]:.4f})")

# Simple search without ordering - just get matching results
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").must("artificial intelligence")).execute()
print("Simple search results:")
for row in result.fetchall():
    print(f"  {row[1]}")

# Using ORM models for fulltext search
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base

Base = declarative_base()

class Article(Base):
    __tablename__ = 'articles'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    author = Column(String(100))
    category = Column(String(50))

# Natural language search with model
result = client.query(Article).filter(
    natural_match(Article.content, query="machine learning")
).execute()
print("Natural language search with model:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Boolean search with model
result = client.query(Article).filter(
    boolean_match(Article.content).phrase("deep learning").encourage("networks")
).execute()
print("Boolean search with model:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Search with scoring using model
result = client.query(
    Article.id,
    Article.title,
    Article.content,
    Article.author,
    natural_match(Article.content, query="artificial intelligence").label("relevance")
).execute()
print("Search with scoring using model:")
for row in result.fetchall():
    print(f"  {row[1]} (Relevance: {row[4]:.4f})")

Fulltext Search Modes and Operators

MatrixOne supports two fulltext search modes with different operators and use cases:

Natural Language Mode

Best for: User-facing search, Google-style queries, general search applications

Features:

  • Automatic stopword removal (e.g., "the", "a", "is", "to")
  • Automatic word stemming (e.g., "running" → "run")
  • Natural relevance scoring (BM25 or TF-IDF)
  • No special operators needed
  • Perfect for search boxes and user queries

Usage:

from matrixone.sqlalchemy_ext import natural_match

# Simple keyword search
result = client.query(Article).filter(
    natural_match(Article.content, query="machine learning")
).execute()

# Multi-word natural query
result = client.query(Article).filter(
    natural_match(Article.title, Article.content,
                 query="how to learn python programming")
).execute()

# Question-like queries (stopwords handled automatically)
result = client.query(Article).filter(
    natural_match(Article.content, query="what is deep learning")
).execute()

Best for: User queries, search boxes, general search

Boolean Mode

Best for: Precise control, advanced filters, complex logic

Boolean Mode Operators:

  1. MUST (+ operator, required terms):

    from matrixone.sqlalchemy_ext import boolean_match
    
    # Must contain "machine" AND "learning"
    result = client.query(Article).filter(
        boolean_match(Article.content).must("machine", "learning")
    ).execute()
    
    # Must contain "Python"
    result = client.query(Article).filter(
        boolean_match(Article.content).must("Python")
    ).execute()
  2. MUST_NOT (- operator, excluded terms):

    # Contains "programming" but NOT "legacy"
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("programming")
        .must_not("legacy")
    ).execute()
    
    # Contains "learning" but NOT "deep"
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("learning")
        .must_not("deep")
    ).execute()
  3. ENCOURAGE (boost relevance, optional terms):

    # Must have "Python", boost if has "data" or "science"
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("Python")
        .encourage("data", "science")
    ).execute()
    
    # Articles with encouraged terms rank higher
  4. DISCOURAGE (~ operator, reduce relevance):

    # Must have "Python", discourage "legacy" (still matches but ranks lower)
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("Python")
        .encourage("modern")
        .discourage("legacy")
    ).execute()
  5. PHRASE ("" operator, exact phrase matching):

    # Exact phrase "neural networks"
    result = client.query(Article).filter(
        boolean_match(Article.content).phrase("neural networks")
    ).execute()
    
    # Exact phrase "best practices"
    result = client.query(Article).filter(
        boolean_match(Article.content).phrase("best practices")
    ).execute()
  6. GROUP (combine terms with OR logic):

    from matrixone.sqlalchemy_ext.fulltext_search import group
    
    # Must contain either "programming" OR "development"
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must(group().medium("programming", "development"))
    ).execute()

Complex Boolean Queries:

# Complex example: All operators combined
result = client.query(Article).filter(
    boolean_match(Article.content)
    .must("learning")                              # Required term
    .must(group().medium("machine", "deep"))       # Required: either "machine" OR "deep"
    .encourage("tutorial")                          # Optional: boosts relevance
    .discourage("advanced")                         # Optional: reduces relevance
    .must_not("legacy")                            # Excluded term
).execute()

# Practical example: Python articles
result = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("Python")                                # Must contain Python
    .encourage("beginner", "tutorial", "guide")    # Prefer beginner content
    .must_not("deprecated", "outdated")            # Exclude old content
).execute()

Quick Comparison:

Feature Natural Mode Boolean Mode
Query Style "how to learn python" must("python")
Processing Auto (stopwords/stem) Exact terms
Best For Search boxes Advanced filters

Advanced Fulltext Search

# Multi-column search - searches across multiple text columns simultaneously
# The columns must match exactly what's defined in your fulltext index
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(natural_match("title", "content", query="machine learning")).execute()
print("Multi-column search results:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Combined search with SQL filters - combines fulltext search with regular SQL conditions
# This allows you to filter by metadata while searching text content

# Method 1: Multiple conditions in single filter()
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(
    natural_match("content", query="AI"),
    "articles.category = 'Technology'"
).execute()
print("Filtered search results (single filter):")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Method 2: Chained filter() calls
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(boolean_match("content").must("AI")).filter("articles.category = 'Technology'").execute()
print("Filtered search results (chained filters):")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Method 3: Complex filtering with multiple conditions
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author",
    "articles.category"
).filter(
    boolean_match("content").encourage("programming"),
    "articles.category = 'Programming'",
    "articles.id > 1"
).execute()
print("Complex filtered search results:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]} - {row[4]}")

# Paginated search results - useful for large result sets
# LIMIT controls how many results to return, OFFSET skips the first N results
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author",
    natural_match("content", query="learning").label("relevance")
).limit(2).offset(1).execute()
print("Paginated search results:")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]} (Score: {row[4]:.4f})")

# Simple pagination without ordering - just get next N results
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").must("learning")).limit(2).offset(1).execute()
print("Simple paginated results:")
for row in result.fetchall():
    print(f"  {row[1]}")

Combining Fulltext Search with Other Filters

You can combine fulltext search with regular SQL filters in several ways:

# Method 1: Multiple conditions in single filter() call
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author",
    "articles.category"
).filter(
    boolean_match("content").must("python"),           # Fulltext condition
    "articles.category = 'Programming'",               # SQL condition 1
    "articles.id > 1",                                 # SQL condition 2
    "articles.author LIKE '%Smith%'"                   # SQL condition 3
).execute()

# Method 2: Chained filter() calls (more readable for complex queries)
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author",
    "articles.category"
).filter(boolean_match("content").encourage("programming"))  # Fulltext condition
 .filter("articles.category = 'Programming'")                # SQL condition 1
 .filter("articles.id > 1")                                  # SQL condition 2
 .filter("articles.author LIKE '%Smith%'")                   # SQL condition 3
 .execute()

# Method 3: Using ORM model attributes (when available)
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base

# Define ORM model
Base = declarative_base()

class Article(Base):
    __tablename__ = 'articles'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    author = Column(String(100))
    category = Column(String(50))

# Using model class in queries
result = client.query(Article).filter(
    boolean_match(Article.content).must("python"),
    Article.category == "Programming",
    Article.id > 1,
    Article.author.like("%Smith%")
).execute()

# Using model with natural_match
result = client.query(Article).filter(
    natural_match(Article.title, Article.content, query="machine learning")
).execute()

# Using model with scoring
result = client.query(
    Article.id,
    Article.title,
    Article.content,
    boolean_match(Article.content).encourage("python").label("score")
).execute()

# Method 4: Complex filtering with IN, BETWEEN, and other operators
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category",
    "articles.tags"
).filter(
    natural_match("title", "content", query="machine learning"),
    "articles.category IN ('AI', 'Technology', 'Programming')",
    "articles.id BETWEEN 1 AND 10",
    "articles.tags LIKE '%tutorial%'",
    "articles.author IS NOT NULL"
).execute()

# Method 5: Combining with scoring
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category",
    boolean_match("title", "content").encourage("python").label("score")
).filter(
    "articles.category = 'Programming'",
    "articles.id > 1"
).limit(5).execute()

# Method 6: Using SQLAlchemy and_/or_ for complex conditions
from sqlalchemy import and_, or_

# Logical AND: Combine fulltext search with category filter
fulltext_condition = boolean_match("title", "content").must("python")
category_condition = "articles.category = 'Programming'"

result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category"
).filter(and_(fulltext_condition, category_condition)).execute()

# Logical OR: Combine different category conditions
programming_condition = "articles.category = 'Programming'"
ai_condition = "articles.category = 'AI'"

result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category"
).filter(or_(programming_condition, ai_condition)).execute()

# Filter by multiple values using SQL IN
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category",
    "articles.author"
).filter("articles.category IN ('Programming', 'AI', 'Technology')").execute()

# SQL IN with fulltext search
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.author"
).filter(
    boolean_match("title", "content").encourage("python"),
    "articles.author IN ('John Doe', 'Jane Smith', 'Bob Wilson')"
).execute()

# Complex nested logical conditions
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content",
    "articles.category",
    "articles.author"
).filter(
    and_(
        boolean_match("title", "content").encourage("programming"),
        or_(
            "articles.category IN ('Programming', 'AI')",
            "articles.category = 'Technology'"
        ),
        "articles.author IN ('John Doe', 'Jane Smith')",
        "articles.id > 1"
    )
).execute()

Boolean Search Operators

# AND operator - both terms must be present in the document
# Use must() for required terms (AND logic)
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").must("machine", "learning")).execute()
print("AND search results:")
for row in result.fetchall():
    print(f"  {row[1]}")

# OR operator - at least one of the terms must be present
# Use group().medium() for OR logic within required conditions
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").must(group().medium("deep", "neural"))).execute()
print("OR search results:")
for row in result.fetchall():
    print(f"  {row[1]}")

# NOT operator (exclusion) - documents containing the excluded term are filtered out
# Use must_not() to exclude documents with specific terms
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").must("machine").must_not("learning")).execute()
print("NOT search results:")
for row in result.fetchall():
    print(f"  {row[1]}")

# Phrase search - exact phrase matching
# Use phrase() for exact phrase matching
result = client.query(
    "articles.id",
    "articles.title",
    "articles.content"
).filter(boolean_match("content").phrase("artificial intelligence")).execute()
print("Phrase search results:")
for row in result.fetchall():
    print(f"  {row[1]}")

# Using ORM models with boolean search operators
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base

Base = declarative_base()

class Article(Base):
    __tablename__ = 'articles'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    author = Column(String(100))
    category = Column(String(50))

# AND operator with model
result = client.query(Article).filter(
    boolean_match(Article.content).must("machine", "learning")
).execute()
print("AND search with model:")
for row in result.fetchall():
    print(f"  {row[1]}")

# OR operator with model
result = client.query(Article).filter(
    boolean_match(Article.content).must(group().medium("deep", "neural"))
).execute()
print("OR search with model:")
for row in result.fetchall():
    print(f"  {row[1]}")

# NOT operator with model
result = client.query(Article).filter(
    boolean_match(Article.content).must("machine").must_not("learning")
).execute()
print("NOT search with model:")
for row in result.fetchall():
    print(f"  {row[1]}")

# Phrase search with model
result = client.query(Article).filter(
    boolean_match(Article.content).phrase("artificial intelligence")
).execute()
print("Phrase search with model:")
for row in result.fetchall():
    print(f"  {row[1]}")

Async Fulltext Operations

import asyncio
from matrixone import AsyncClient
from matrixone.config import get_connection_params

async def async_fulltext_example():
    # Get connection parameters
    host, port, user, password, database = get_connection_params()

    client = AsyncClient()
    await client.connect(host=host, port=port, user=user, password=password, database=database)

    # Create table using async create_table API
    await client.create_table("async_articles", {
        "id": "int",
        "title": "varchar(200)",
        "content": "text",
        "author": "varchar(100)"
    }, primary_key="id")

    # Create fulltext index using async fulltext_index API
    await client.fulltext_index.create("async_articles", name="idx_content", columns="content", algorithm="BM25")

    # Insert data using async insert API
    await client.insert("async_articles", {
        "id": 1,
        "title": "Async Article",
        "content": "This is an article created using async operations for fulltext search testing.",
        "author": "Async Author"
    })

    # Fulltext search using async query API
result = await client.query(
    "async_articles.id",
    "async_articles.title",
    "async_articles.content",
    "async_articles.author"
).filter(natural_match("content", query="async operations")).execute()
    print("Async fulltext search results:")
    for row in result.fetchall():
        print(f"  {row[1]} by {row[3]}")

# Using ORM models with async fulltext search
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base

Base = declarative_base()

class AsyncArticle(Base):
    __tablename__ = 'async_articles'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    author = Column(String(100))
    category = Column(String(50))

# Async search with model using boolean_match
result = await client.query(AsyncArticle).filter(
    boolean_match(AsyncArticle.content).must("async")
).execute()
print("Async search with model (boolean_match):")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Async search with model using natural_match
result = await client.query(AsyncArticle).filter(
    natural_match(AsyncArticle.title, AsyncArticle.content, query="async operations")
).execute()
print("Async search with model (natural_match):")
for row in result.fetchall():
    print(f"  {row[1]} by {row[3]}")

# Async search with model and scoring
result = await client.query(
    AsyncArticle.id,
    AsyncArticle.title,
    AsyncArticle.content,
    boolean_match(AsyncArticle.content).encourage("async").label("score")
).execute()
print("Async search with model and scoring:")
for row in result.fetchall():
    print(f"  {row[1]} (Score: {row[3]:.4f})")

    # Clean up
    await client.drop_table("async_articles")
    await client.disconnect()

asyncio.run(async_fulltext_example())

ORM with Fulltext Search

from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base
from sqlalchemy.orm import sessionmaker
from matrixone import Client
from matrixone.config import get_connection_params
from matrixone.sqlalchemy_ext.fulltext_search import boolean_match, natural_match

# Define ORM models
Base = declarative_base()

class Article(Base):
    __tablename__ = 'orm_articles'

    id = Column(Integer, primary_key=True, autoincrement=True)
    title = Column(String(200), nullable=False)
    content = Column(Text, nullable=False)
    author = Column(String(100))
    category = Column(String(50))

def orm_fulltext_example():
    # Get connection parameters
    host, port, user, password, database = get_connection_params()
    client = Client()
    client.connect(host=host, port=port, user=user, password=password, database=database)

    # Create table using ORM model
    client.create_table(Article)

    # Create fulltext index
    client.fulltext_index.create("orm_articles", name="idx_content", columns="content", algorithm="BM25")

    # Create session
    Session = sessionmaker(bind=client.get_sqlalchemy_engine())
    session = Session()

    # Insert data using ORM
    article1 = Article(
        title="ORM Article 1",
        content="This article demonstrates fulltext search with ORM models in MatrixOne.",
        author="ORM Author",
        category="Technology"
    )

    session.add(article1)
    session.commit()

    # Natural language search - automatically processes query for optimal results
    # Handles synonyms, stemming, and stopword removal automatically
    result = client.query(Article).filter(natural_match(Article.content, "fulltext search")).execute()
    print("Natural language search results:")
    for row in result.fetchall():
        print(f"  {row[1]} by {row[3]}")

    # Boolean search with must conditions - both terms are required
    # Chain multiple must() calls for AND logic; all terms must be present
    result = client.query(Article).filter(boolean_match(Article.content).must("fulltext").must("search")).execute()
    print("Boolean search results:")
    for row in result.fetchall():
        print(f"  {row[1]} by {row[3]}")

    # Complex boolean search with multiple operators
    # must() = required, encourage() = preferred but optional, must_not() = excluded
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("fulltext")           # Required: must contain "fulltext"
        .encourage("search")        # Preferred: boost relevance if present
        .must_not("legacy")         # Excluded: filter out documents with "legacy"
    ).execute()
    print("Complex boolean search results:")
    for row in result.fetchall():
        print(f"  {row[1]} by {row[3]}")

    # Clean up
    client.drop_table(Article)
    session.close()
    client.disconnect()

orm_fulltext_example()

Advanced ORM-Style Fulltext Queries

Modern ORM-style fulltext queries with boolean_match and natural_match:

from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base
from matrixone import Client
from matrixone.config import get_connection_params
from matrixone.sqlalchemy_ext.fulltext_search import boolean_match, natural_match, group

# Define ORM models
Base = declarative_base()

class Article(Base):
    __tablename__ = 'advanced_articles'

    id = Column(Integer, primary_key=True, autoincrement=True)
    title = Column(String(200), nullable=False)
    content = Column(Text, nullable=False)
    tags = Column(String(500))
    category = Column(String(50))

def advanced_orm_fulltext_example():
    host, port, user, password, database = get_connection_params()
    client = Client()
    client.connect(host=host, port=port, user=user, password=password, database=database)

    # Create table using ORM model
    client.create_table(Article)

    # Create fulltext index
    client.fulltext_index.create("advanced_articles", name="idx_content_tags", columns=["content", "tags"], algorithm="BM25")

    # Insert test data
    articles = [
        {"title": "Python Programming Guide", "content": "Learn Python programming from basics to advanced concepts.", "tags": "python,programming,tutorial", "category": "Programming"},
        {"title": "Machine Learning with Python", "content": "Introduction to machine learning using Python and scikit-learn.", "tags": "python,machine-learning,AI", "category": "AI"},
        {"title": "Web Development Tutorial", "content": "Build modern web applications with Python and Django framework.", "tags": "python,web,django", "category": "Web"}
    ]
    client.batch_insert(Article, articles)

    # 1. Natural language search - user-friendly, handles variations automatically
    # Best for end-user search interfaces; processes "python programming" intelligently
    result = client.query(Article).filter(natural_match(Article.content, "python programming")).execute()
    print("Natural language search results:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 2. Basic boolean search - exact term matching with required conditions
    # Must contain "python" - strict matching without stemming or variations
    result = client.query(Article).filter(boolean_match(Article.content).must("python")).execute()
    print("\nBoolean search - must contain 'python':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 3. Boolean search with exclusion - filter out unwanted results
    # Required: "python", Excluded: "django" - finds Python articles without Django
    result = client.query(Article).filter(
        boolean_match(Article.content).must("python").must_not("django")
    ).execute()
    print("\nBoolean search - must have 'python', must not have 'django':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 4. Boolean search with preference - boost relevance without filtering
    # Required: "python", Preferred: "tutorial" - boosts tutorial results in ranking
    result = client.query(Article).filter(
        boolean_match(Article.content).must("python").encourage("tutorial")
    ).execute()
    print("\nBoolean search - must have 'python', encourage 'tutorial':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 5. Boolean search with discouragement - lower ranking for certain terms
    # Required: "python", Discouraged: "legacy" - lowers ranking of legacy content
    result = client.query(Article).filter(
        boolean_match(Article.content).must("python").discourage("legacy")
    ).execute()
    print("\nBoolean search - must have 'python', discourage 'legacy':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 6. Group search - logical OR within required conditions
    # Must contain either "programming" OR "machine" - flexible matching
    result = client.query(Article).filter(
        boolean_match(Article.content).must(group().medium("programming", "machine"))
    ).execute()
    print("\nGroup search - must contain either 'programming' or 'machine':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 7. Phrase search - exact phrase matching
    # Finds documents containing the exact phrase "machine learning"
    result = client.query(Article).filter(
        boolean_match(Article.content).phrase("machine learning")
    ).execute()
    print("\nPhrase search - exact phrase 'machine learning':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 8. Prefix search - wildcard matching for word beginnings
    # Finds words starting with "python" (e.g., "pythonic", "pythonista")
    result = client.query(Article).filter(
        boolean_match(Article.content).prefix("python")
    ).execute()
    print("\nPrefix search - words starting with 'python':")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 9. Complex boolean search - combining multiple operators for sophisticated queries
    # Required: "python" AND (either "programming" OR "machine")
    # Preferred: "tutorial", Discouraged: "legacy" - advanced ranking control
    result = client.query(Article).filter(
        boolean_match(Article.content)
        .must("python")                                    # Must contain "python"
        .must(group().medium("programming", "machine"))    # Must contain either term
        .encourage("tutorial")                             # Boost tutorial content
        .discourage("legacy")                              # Lower legacy content ranking
    ).execute()
    print("\nComplex boolean search:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 10. Combined fulltext and SQL filters - mix fulltext search with metadata filtering
    # Fulltext search for content + SQL filter for category metadata
    result = client.query(Article).filter(
        boolean_match(Article.content).must("python")      # Fulltext search
    ).filter(
        Article.category == "Programming"                  # SQL filter
    ).execute()
    print("\nCombined with regular filters:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 11. Limited results - control result presentation
    # Return only top 2 results
    result = client.query(Article).filter(
        boolean_match(Article.content).must("python")
    ).limit(2).execute()
    print("\nLimited results:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # Clean up
    client.drop_table(Article)
    client.disconnect()

advanced_orm_fulltext_example()

Complete ORM-Style Fulltext Search Examples

Here are comprehensive examples showing all available operators in action:

from matrixone import Client
from matrixone.sqlalchemy_ext.fulltext_search import boolean_match, natural_match, group
from matrixone.config import get_connection_params

def complete_fulltext_examples():
    host, port, user, password, database = get_connection_params()
    client = Client()
    client.connect(host=host, port=port, user=user, password=password, database=database)

    # Create table and index
    client.create_table("complete_articles", {
        "id": "int",
        "title": "varchar(200)",
        "content": "text",
        "tags": "varchar(500)",
        "category": "varchar(50)"
    }, primary_key="id")

    client.fulltext_index.create("complete_articles", name="idx_complete", columns=["title", "content", "tags"], algorithm="BM25")

    # Insert test data
    articles = [
        {"id": 1, "title": "Python Programming Guide", "content": "Learn Python programming from basics to advanced concepts.", "tags": "python,programming,tutorial", "category": "Programming"},
        {"id": 2, "title": "Machine Learning with Python", "content": "Introduction to machine learning using Python and scikit-learn.", "tags": "python,machine-learning,AI", "category": "AI"},
        {"id": 3, "title": "Web Development Tutorial", "content": "Build modern web applications with Python and Django framework.", "tags": "python,web,django", "category": "Web"},
        {"id": 4, "title": "Legacy Python Code", "content": "This is deprecated Python code that should be avoided.", "tags": "python,legacy,deprecated", "category": "Legacy"}
    ]
    client.batch_insert("complete_articles", articles)

    # 1. Natural language search with relevance scoring
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        natural_match("title", "content", query="python programming").label("relevance")
    ).execute()
    print("Natural language search with scoring:")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # 1b. Natural language search without ordering (simpler)
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(natural_match("title", "content", query="python programming")).execute()
    print("Natural language search (simple):")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 2. Boolean search with must conditions (AND logic)
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").must("python", "programming")).execute()
    print("\nBoolean search - must contain 'python' AND 'programming':")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 3. Boolean search with exclusion (NOT logic)
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").must("python").must_not("legacy")).execute()
    print("\nBoolean search - must have 'python', must not have 'legacy':")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 4. Boolean search with preference (encourage)
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        boolean_match("title", "content").must("python").encourage("tutorial").label("score")
    ).execute()
    print("\nBoolean search - must have 'python', encourage 'tutorial':")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # 5. Boolean search with discouragement
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        boolean_match("title", "content").must("python").discourage("legacy").label("score")
    ).execute()
    print("\nBoolean search - must have 'python', discourage 'legacy':")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # 6. Group search with OR logic
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").must(group().medium("programming", "machine"))).execute()
    print("\nGroup search - must contain either 'programming' OR 'machine':")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 7. Weighted group search
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        boolean_match("title", "content").encourage(group().high("tutorial").low("basic")).label("score")
    ).execute()
    print("\nWeighted group search - prefer 'tutorial' over 'basic':")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # 8. Phrase search
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").phrase("machine learning")).execute()
    print("\nPhrase search - exact phrase 'machine learning':")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 9. Prefix search
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").prefix("python")).execute()
    print("\nPrefix search - words starting with 'python':")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 10. Complex boolean search combining multiple operators
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        boolean_match("title", "content")
        .must("python")                                    # Must contain "python"
        .must(group().medium("programming", "machine"))    # Must contain either term
        .encourage("tutorial")                             # Boost tutorial content
        .discourage("legacy")                              # Lower legacy content ranking
        .label("complex_score")
    ).execute()
    print("\nComplex boolean search:")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # 11. Combined fulltext and SQL filters (single filter with multiple conditions)
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category"
    ).filter(
        boolean_match("title", "content").must("python"),  # Fulltext search
        "complete_articles.category = 'Programming'"        # SQL filter
    ).execute()
    print("\nCombined with regular filters (single filter):")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]}")

    # 11b. Chained filter calls
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category"
    ).filter(boolean_match("title", "content").encourage("programming")).filter("complete_articles.category = 'Programming'").execute()
    print("\nCombined with regular filters (chained):")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]}")

    # 11c. Complex filtering with multiple SQL conditions
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category",
        "complete_articles.tags"
    ).filter(
        boolean_match("title", "content").must("python"),
        "complete_articles.category = 'Programming'",
        "complete_articles.id > 1",
        "complete_articles.tags LIKE '%tutorial%'"
    ).execute()
    print("\nComplex filtering with multiple conditions:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]} - {row[4]}")

    # 11d. Filtering with IN conditions
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category"
    ).filter(
        boolean_match("title", "content").encourage("python"),
        "complete_articles.category IN ('Programming', 'AI')"
    ).execute()
    print("\nFiltering with IN conditions:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]}")

    # 11e. Filtering with range conditions
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(
        boolean_match("title", "content").must("python"),
        "complete_articles.id BETWEEN 1 AND 3"
    ).execute()
    print("\nFiltering with range conditions:")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 11f. Using SQL IN for multiple value filtering
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category"
    ).filter(
        boolean_match("title", "content").encourage("python"),
        "complete_articles.category IN ('Programming', 'AI', 'Technology')"
    ).execute()
    print("\nFiltering with SQL IN:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]}")

    # 11g. Complex logical conditions with and_/or_
    from sqlalchemy import and_, or_

    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content",
        "complete_articles.category",
        "complete_articles.tags"
    ).filter(
        and_(
            boolean_match("title", "content").must("python"),
            or_(
                "complete_articles.category IN ('Programming', 'AI')",
                "complete_articles.category = 'Technology'"
            ),
            "complete_articles.id > 1"
        )
    ).execute()
    print("\nComplex logical conditions:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[3]}")

    # 11h. Using ORM models with fulltext search
    from sqlalchemy import Column, Integer, String, Text
    from matrixone.orm import declarative_base

    # Define ORM model
    Base = declarative_base()

    class ArticleModel(Base):
        __tablename__ = 'complete_articles'
        id = Column(Integer, primary_key=True)
        title = Column(String(200))
        content = Column(Text)
        tags = Column(String(500))
        category = Column(String(50))

    # Using model with boolean_match
    result = client.query(ArticleModel).filter(
        boolean_match(ArticleModel.content).must("python")
    ).execute()
    print("\nUsing ORM model with boolean_match:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # Using model with natural_match
    result = client.query(ArticleModel).filter(
        natural_match(ArticleModel.title, ArticleModel.content, query="machine learning")
    ).execute()
    print("\nUsing ORM model with natural_match:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # Using model with scoring and ordering
    result = client.query(
        ArticleModel.id,
        ArticleModel.title,
        ArticleModel.content,
        boolean_match(ArticleModel.content).encourage("python").label("score")
    ).execute()
    print("\nUsing ORM model with scoring:")
    for row in result.fetchall():
        print(f"  {row[1]} (Score: {row[3]:.4f})")

    # Using model with SQLAlchemy and_
    from sqlalchemy import and_

    result = client.query(ArticleModel).filter(
        and_(
            boolean_match(ArticleModel.content).must("python"),
            ArticleModel.category.in_(["Programming", "AI"]),
            ArticleModel.id > 1
        )
    ).execute()
    print("\nUsing ORM model with and_:")
    for row in result.fetchall():
        print(f"  {row[1]} - {row[4]}")

    # 12. Limited results
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").must("python")).limit(2).execute()
    print("\nLimited results:")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # 12b. Simple limited results without ordering
    result = client.query(
        "complete_articles.id",
        "complete_articles.title",
        "complete_articles.content"
    ).filter(boolean_match("title", "content").must("python")).limit(2).execute()
    print("\nSimple limited results:")
    for row in result.fetchall():
        print(f"  {row[1]}")

    # Clean up
    client.drop_table("complete_articles")
    client.disconnect()

complete_fulltext_examples()

Boolean Match Operators Reference

The boolean_match function provides powerful operators for precise fulltext search control:

Core Operators:

  • `.must(term)` - Required term (AND logic) - Document must contain this term - Chain multiple .must() calls for AND conditions - Example: .must("python").must("programming") = "python AND programming"
  • `.must_not(term)` - Excluded term (NOT logic) - Document must NOT contain this term - Filters out unwanted results - Example: .must("python").must_not("legacy") = "python NOT legacy"
  • `.encourage(term)` - Preferred term (positive weight) - Boosts relevance score if term is present - Does not filter results if term is absent - Example: .must("python").encourage("tutorial") = "python, prefer tutorial"
  • `.discourage(term)` - Discouraged term (negative weight) - Lowers relevance score if term is present - Does not filter results if term is absent - Example: .must("python").discourage("legacy") = "python, avoid legacy"

Group Operators:

  • `.must(group().medium(term1, term2))` - Required group (OR logic) - Document must contain at least one term from the group - Example: .must(group().medium("python", "java")) = "python OR java"
  • `.encourage(group().high(term1).low(term2))` - Weighted group - .high() gives higher weight, .low() gives lower weight - Example: .encourage(group().high("tutorial").low("basic")) = "prefer tutorial over basic"

Special Operators:

  • `.phrase("exact phrase")` - Exact phrase matching - Finds documents containing the exact phrase - Example: .phrase("machine learning") = exact phrase match
  • `.prefix("prefix")` - Prefix/wildcard matching - Finds words starting with the prefix - Example: .prefix("python") = matches "python", "pythonic", "pythonista"

Usage Patterns:

# Basic required search
boolean_match(Article.content).must("python")

# Multiple requirements (AND)
boolean_match(Article.content).must("python").must("programming")

# Required with exclusion (AND NOT)
boolean_match(Article.content).must("python").must_not("legacy")

# Required with preference (AND, prefer X)
boolean_match(Article.content).must("python").encourage("tutorial")

# Required with discouragement (AND, avoid X)
boolean_match(Article.content).must("python").discourage("deprecated")

# Group requirements (AND (OR))
boolean_match(Article.content).must(group().medium("python", "java"))

# Complex combination
boolean_match(Article.content)
    .must("programming")
    .must(group().medium("python", "java"))
    .encourage("tutorial")
    .discourage("legacy")
    .phrase("best practices")

Error Handling

from matrixone import Client
from matrixone.exceptions import QueryError, ConnectionError
from matrixone.config import get_connection_params

def error_handling_example():
    client = None

    try:
        host, port, user, password, database = get_connection_params()

        # Create client with error handling
        client = Client()
        client.connect(host=host, port=port, user=user, password=password, database=database)

        # Create table with error handling
        try:
            client.create_table("error_articles", {
                "id": "int",
                "content": "text"
            }, primary_key="id")
            print("✓ Table created successfully")
        except QueryError as e:
            print(f"❌ Table creation failed: {e}")

        # Create fulltext index with error handling
        try:
            client.fulltext_index.create("error_articles", name="idx_content", columns="content", algorithm="BM25")
            print("✓ Fulltext index created successfully")
        except QueryError as e:
            print(f"❌ Fulltext index creation failed: {e}")

        # Insert data with error handling
        try:
            client.insert("error_articles", {"id": 1, "content": "Test content for fulltext search"})
            print("✓ Data inserted successfully")
        except QueryError as e:
            print(f"❌ Data insertion failed: {e}")

        # Fulltext search with error handling
        try:
            result = client.query(
                "error_articles.id",
                "error_articles.content"
            ).filter(natural_match("content", query="test content")).execute()
            print(f"✓ Fulltext search successful: {len(result.fetchall())} results")
        except QueryError as e:
            print(f"❌ Fulltext search failed: {e}")

    except ConnectionError as e:
        print(f"❌ Connection failed: {e}")
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
    finally:
        # Always clean up
        if client:
            try:
                client.drop_table("error_articles")
                client.disconnect()
                print("✓ Cleanup completed")
            except Exception as e:
                print(f"⚠️ Cleanup warning: {e}")

error_handling_example()

Best Practices

  1. Choose the right algorithm: - BM25: Best for general fulltext search, handles modern document collections well - TF-IDF: Good for specific use cases, traditional approach with proven reliability - Recommendation: Start with BM25 for new applications
  2. Optimize index creation: - Create indexes after data insertion: Avoid rebuilding indexes during data loading - Use appropriate column types: TEXT for large content, VARCHAR for shorter text - Match index columns exactly: Columns in MATCH() must exactly match fulltext index definition - Consider multi-column indexes: Index related text columns together for better performance
  3. Use appropriate search modes: - Natural language mode: Best for user-facing search interfaces, handles variations automatically - Boolean mode: Best for programmatic queries, provides precise control over search terms - ORM boolean_match: Use for type-safe, chainable queries with modern syntax
  4. Pagination: - With scoring: Use .label("score") to get relevance scores for ranking results - Without scoring: Skip scoring for simple searches where ranking isn't important - Pagination: Use .limit() and .offset() for pagination (ordering is optional) - Performance: Scoring may be slower than simple searches, but provides better relevance
  5. Optimize search queries: - Use encourage() over must(): When terms are preferred but not required - Use discourage() for ranking: Lower unwanted content without filtering it out - Combine with SQL filters: Mix fulltext search with metadata filtering for better results - Use phrases for exact matches: Wrap exact phrases in quotes or use .phrase()
  6. Filter combination strategies: - Single filter() with multiple conditions: More efficient for simple combinations - Chained filter() calls: Better readability for complex queries - Use appropriate operators: IN, BETWEEN, LIKE, IS NULL for different filtering needs - Combine with scoring: Use .label() for ranked results - Performance consideration: More filters = more precise results but potentially slower queries
  7. SQLAlchemy logical operators for complex conditions: - and_(): Combine multiple conditions with AND logic (from sqlalchemy) - or_(): Combine multiple conditions with OR logic (from sqlalchemy) - not_(): Negate conditions (from sqlalchemy) - SQL IN: Filter by multiple values using SQL IN clause - Nested combinations: Use SQLAlchemy operators for complex nested conditions - Fulltext + logical operators: Combine fulltext search with SQLAlchemy expressions - Performance: Logical operators provide more control but may impact query performance
  8. Handle errors gracefully: - Always use try-catch blocks: Fulltext operations can fail due to index issues - Provide meaningful error messages: Help users understand what went wrong - Clean up resources properly: Always disconnect clients and close sessions - Validate query syntax: Check boolean operators before executing complex queries
  9. Performance optimization: - Use batch operations: Insert large datasets with batch_insert() instead of individual inserts - Create indexes strategically: Only index columns that will be searched - Limit result sets: Use LIMIT and OFFSET for pagination with large result sets - Monitor index usage: Regularly check which indexes are being used effectively

Next Steps