I'm interested in understanding how formal proof engines like Coq work, and how large language models (LLMs) can be used to translate natural language queries into formal representations that such systems can process.
This project is inspired by the work of Symbolica and George Morgan, with the long-term goal of grasping and contributing to the kind of reasoning systems theyโre building.
A novel hybrid approach to mathematical problem solving that combines Large Language Models (LLMs) with formal reasoning engines to achieve both natural language understanding and mathematical rigor.
Hybrid Architecture: LLMs handle the ambiguous natural language โ formal representation mapping, while deterministic algorithms ensure mathematical rigor and correctness in the reasoning process.
Natural Language Input
โ [LLM Agent: Problem Extraction]
Formal Problem Representation
โ [Validation Engine]
Verified Facts, Rules, Goals
โ [Forward Chaining Proof Engine]
Proof Path or Failure
โ [LLM Agent: Explanation Generation]
Human-Readable Proof
- ๐งฎ Natural Language Processing: Azure OpenAI GPT-4o for problem extraction and explanation
- ๐ฌ Formal Mathematical Reasoning: Deterministic proof engine with pattern matching
- ๐ Expression Parsing: SymPy-integrated parser for mathematical expressions
- ๐ Rule System: 25+ mathematical rules across algebra, arithmetic, and number theory
- โก Forward Chaining: Systematic proof search with termination guarantees
- ๐ Web Interface: Streamlit-based interactive demo
-
Linear Algebraic Equations
If x + 5 = 12, find xIf 3y = 21, what is y?
-
Number Theory Proofs
If n is even, prove that 2n is evenIf a is odd and b is odd, prove that a + b is even
-
Inequality Reasoning
If a > b and b > c, prove a > cIf x > 5 and 5 > 2, prove x > 2
- Backend: Python 3.11+
- LLM Integration: Azure OpenAI GPT-4o API
- Mathematical Processing: SymPy for expression handling
- Web Framework: Streamlit for rapid prototyping
- Data Structures: NetworkX for proof graph representation
- Testing: pytest with 65+ comprehensive tests
- Package Management: uv for fast dependency resolution
- Python 3.11+
- Azure OpenAI API access
- uv package manager
# Clone the repository
git clone <repository-url>
cd MiniAxiom
# Install dependencies
uv sync
# Set up environment variables
cp .env.template .env
# Edit .env with your Azure OpenAI credentialsCreate a .env file with your Azure OpenAI credentials:
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_DEPLOYMENT_NAME=your-deployment-nameuv run streamlit run streamlit_app.pyOpen http://localhost:8501 in your browser for the interactive interface.
from src.mathgraph import MathGraphAPI
# Initialize the system
api = MathGraphAPI()
# Solve a problem
result = api.solve("If x + 7 = 15, find x")
print(result["explanation"])# Core system demo
uv run python demo_core_system.py
# Proof engine demo
uv run python demo_proof_engine.py
# LLM integration demo
uv run python demo_llm_integration.py
# Complete system demo
uv run python demo_complete_system.pyBased on comprehensive testing:
- Success Rate: 100% on supported problem types
- Average Processing Time: ~8-12 seconds per problem
- Proof Steps: Typically 1-3 steps for target problems
- Extraction Confidence: 0.9-0.95 for well-formed problems
-
Expression System (
src/formal/expressions.py)- Mathematical expression hierarchy
- Fact and rule representations
- Problem containers
-
Parser (
src/formal/parser.py)- SymPy-integrated expression parsing
- Predicate and rule parsing
- Validation and error handling
-
Pattern Matching (
src/reasoning/pattern_matching.py)- Variable substitution system
- Rule application matching
- Proof step validation
-
Rule System (
src/reasoning/rules.py)- 25+ mathematical inference rules
- Categorized rule management
- Dynamic rule activation
-
Proof Engine (
src/reasoning/proof_engine.py)- Forward chaining algorithm
- Proof search with termination
- Step-by-step proof tracking
-
LLM Integration (
src/extraction/&src/explanation/)- Natural language problem extraction
- Formal proof explanation generation
- Azure OpenAI API integration
-
System Integration (
src/mathgraph.py)- End-to-end pipeline
- Error handling and recovery
- Performance monitoring
The system uses a restricted formal language with these predicates:
- Equality & Comparison:
eq,gt,lt,gte,lte - Number Properties:
even,odd,prime,positive,negative - Arithmetic Relations:
divides,multiple
Sample rules implemented:
R1: eq(X + A, B) โ eq(X, B - A) [Subtraction Property]
R2: eq(A * X, B) โ eq(X, B / A) [Division Property]
R3: eq(X, Y) โง eq(Y, Z) โ eq(X, Z) [Transitivity]
R4: even(X) โ even(X * Y) [Even Multiplication]
R5: gt(X, Y) โง gt(Y, Z) โ gt(X, Z) [Greater Than Transitivity]
The system successfully demonstrates:
- Linear Equations:
If x + 7 = 15, find xโx = 8 - Multiplication:
If 3y = 21, what is y?โy = 7 - Number Theory:
If n is even, prove 2n is evenโ Valid proof - Inequalities:
If a > b and b > c, prove a > cโ Transitivity proof
- 100% Success Rate on target problem types
- ~1-3 Proof Steps for typical problems
- 8-12 Second Processing including LLM calls
- High Confidence Extraction (0.9-0.95)
- Backward Chaining: Goal-directed proof search
- Rule Learning: Extract patterns from successful proofs
- Multi-step Algebra: More complex equation solving
- Higher-order Logic: Quantifiers and functions
- Geometric Reasoning: Basic geometric predicates
- Calculus Integration: Derivatives and limits
- Group Theory: Convert generic theorems to group theory problems
- Educational Platform: Interactive tutoring system
- Research Tool: Mathematician proof assistance
- Automated Grading: Homework evaluation
MiniAxiom/
โโโ src/
โ โโโ formal/ # Core mathematical structures
โ โ โโโ expressions.py # Expression hierarchy
โ โ โโโ parser.py # Mathematical parsing
โ โโโ reasoning/ # Proof engine components
โ โ โโโ pattern_matching.py # Rule application
โ โ โโโ rules.py # Mathematical rules
โ โ โโโ proof_engine.py # Forward chaining
โ โโโ extraction/ # LLM problem extraction
โ โ โโโ llm_client.py # Azure OpenAI client
โ โ โโโ problem_extractor.py # Natural language processing
โ โโโ explanation/ # Proof explanation
โ โ โโโ proof_explainer.py # Natural language generation
โ โโโ mathgraph.py # Main system integration
โโโ tests/ # Comprehensive test suite
โโโ examples/ # Demo problems
โโโ streamlit_app.py # Web interface
โโโ demo_*.py # Various demo scripts
โโโ README.md # This file
This is my vibecoding and experimenting this prototype demonstrating hybrid AI mathematical reasoning. Eventually this needs to be formalized for hypergraphss. Where can we use this?
- Educational Tools: Interactive math tutoring
- Research Platforms: Formal reasoning experiments
- Production Systems: Automated mathematical problem solving
MathGraph v1.0 - Demonstrating the future of AI mathematics through hybrid LLM + formal reasoning approaches.