From e734a543e9c394dd922c982fb64a8144557b5baf Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 1 Aug 2025 17:02:11 +0000 Subject: [PATCH 1/2] Initial plan From 923c845534349f572a9e8edb9556d9a165f82eba Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 1 Aug 2025 17:18:44 +0000 Subject: [PATCH 2/2] Complete comprehensive AI agent documentation for Musoq Co-authored-by: Puchaczov <6973258+Puchaczov@users.noreply.github.com> --- .copilot/INDEX.md | 148 +++++ .copilot/README.md | 586 ++++++++++++++++++++ .copilot/api-usage-examples.md | 690 ++++++++++++++++++++++++ .copilot/architecture-deep-dive.md | 367 +++++++++++++ .copilot/development-debugging-guide.md | 682 +++++++++++++++++++++++ .copilot/plugin-development-guide.md | 616 +++++++++++++++++++++ 6 files changed, 3089 insertions(+) create mode 100644 .copilot/INDEX.md create mode 100644 .copilot/README.md create mode 100644 .copilot/api-usage-examples.md create mode 100644 .copilot/architecture-deep-dive.md create mode 100644 .copilot/development-debugging-guide.md create mode 100644 .copilot/plugin-development-guide.md diff --git a/.copilot/INDEX.md b/.copilot/INDEX.md new file mode 100644 index 00000000..ae99d289 --- /dev/null +++ b/.copilot/INDEX.md @@ -0,0 +1,148 @@ +# .copilot Documentation Index + +This directory contains comprehensive documentation for AI agents working with the Musoq SQL query engine codebase. + +## Documentation Files + +### 📖 [README.md](./README.md) +**Main documentation entry point** - Comprehensive overview covering architecture, components, development workflow, and testing strategies. Start here for a complete understanding of the Musoq system. + +**Key Topics:** +- Quick start and essential understanding +- Architecture overview and design principles +- Core component deep dive (Parser, Schema, Converter, Evaluator, Plugins) +- Plugin development basics +- Query processing pipeline +- Development workflow and project structure +- API usage patterns +- Testing strategies +- Build and deployment +- Troubleshooting guide +- Key files reference + +### 🏗️ [architecture-deep-dive.md](./architecture-deep-dive.md) +**Detailed technical architecture** - In-depth analysis of each component's internal architecture, design patterns, and integration strategies. + +**Key Topics:** +- Component architecture details +- Parser module internals (AST, lexing, precedence) +- Schema module abstractions (ISchema, RowSource, data flow) +- Converter build chain pipeline +- Evaluator compilation and execution +- Plugin system architecture +- Integration patterns +- Performance considerations +- Memory management and optimization + +### 🔌 [plugin-development-guide.md](./plugin-development-guide.md) +**Complete plugin development reference** - Step-by-step guide for creating data sources and function libraries. + +**Key Topics:** +- Data source plugin creation (schema, row source, entities) +- Advanced row source patterns (chunking, parameterization) +- Dynamic schema support and runtime column discovery +- Function library development (basic, generic, aggregation) +- Complex function examples (JSON, HTTP/Web) +- Plugin registration and deployment +- Testing plugin development +- Best practices for performance and error handling + +### 🛠️ [development-debugging-guide.md](./development-debugging-guide.md) +**Development environment and debugging** - Comprehensive guide for setting up development environment and debugging techniques. + +**Key Topics:** +- Development environment setup +- Component-specific development cycles +- Debugging techniques (AST, code generation, execution, schema resolution) +- Testing strategies (unit, integration, performance) +- Build and CI/CD configuration +- Troubleshooting common issues (parse errors, schema failures, compilation errors) +- Performance optimization and profiling + +### 💻 [api-usage-examples.md](./api-usage-examples.md) +**Practical API usage and examples** - Real-world examples and usage patterns for integrating Musoq. + +**Key Topics:** +- Core API overview (InstanceCreator, analysis API) +- Schema provider implementation patterns +- Data source examples (in-memory, REST API, database) +- Advanced usage patterns (parameterization, dynamic registration, streaming) +- Query analysis and optimization +- Error handling and logging +- Testing API usage + +## Quick Navigation + +### For New Contributors +1. Start with [README.md](./README.md) - Quick Start section +2. Read [architecture-deep-dive.md](./architecture-deep-dive.md) - Component Architecture +3. Follow [development-debugging-guide.md](./development-debugging-guide.md) - Development Environment Setup + +### For Plugin Developers +1. Review [README.md](./README.md) - Plugin Development section +2. Follow [plugin-development-guide.md](./plugin-development-guide.md) - Complete guide +3. Reference [api-usage-examples.md](./api-usage-examples.md) - Data source examples + +### For API Integration +1. Study [api-usage-examples.md](./api-usage-examples.md) - Core API and examples +2. Reference [README.md](./README.md) - API Usage Patterns +3. Check [development-debugging-guide.md](./development-debugging-guide.md) - Testing strategies + +### For Debugging Issues +1. Check [development-debugging-guide.md](./development-debugging-guide.md) - Troubleshooting section +2. Review [README.md](./README.md) - Troubleshooting Guide +3. Use [architecture-deep-dive.md](./architecture-deep-dive.md) - Component internals + +### For Performance Optimization +1. Review [architecture-deep-dive.md](./architecture-deep-dive.md) - Performance Considerations +2. Check [development-debugging-guide.md](./development-debugging-guide.md) - Performance Optimization +3. Study [plugin-development-guide.md](./plugin-development-guide.md) - Best Practices + +## Key Concepts Quick Reference + +### Core Architecture +- **Parser**: SQL → AST (Abstract Syntax Tree) +- **Schema**: Data source abstraction and contracts +- **Converter**: AST → C# code generation +- **Evaluator**: Dynamic compilation and execution +- **Plugins**: Extensible function library + +### Data Flow +``` +SQL Query → Lexer → Parser → AST → Converter → C# Code → Compiler → Assembly → Executor → Results +``` + +### Essential Classes +- `Parser` - Main parsing logic +- `ISchema` - Plugin interface for data sources +- `RowSource` - Data iteration abstraction +- `CompiledQuery` - Executable query wrapper +- `InstanceCreator` - Main API entry point + +### Plugin Pattern +```csharp +ISchema → SchemaBase → YourSchema + ↓ +RowSource → YourRowSource + ↓ +Your Entity Classes +``` + +### Testing Pattern +```csharp +[TestClass] +public class YourTests : BasicEntityTestBase +{ + [TestMethod] + public void Should_Test_Feature() + { + var vm = CreateAndRunVirtualMachine(query, data); + var results = vm.Run(); + // Assertions + } +} +``` + +--- + +This documentation is designed specifically for AI agents and provides comprehensive coverage of the Musoq codebase for development, debugging, and extension purposes. \ No newline at end of file diff --git a/.copilot/README.md b/.copilot/README.md new file mode 100644 index 00000000..6b78fd92 --- /dev/null +++ b/.copilot/README.md @@ -0,0 +1,586 @@ +# Musoq: SQL Query Engine for Everything - AI Agent Documentation + +## Table of Contents + +1. [Quick Start](#quick-start) +2. [Architecture Overview](#architecture-overview) +3. [Core Components](#core-components) +4. [Plugin Development](#plugin-development) +5. [Query Processing Pipeline](#query-processing-pipeline) +6. [Development Workflow](#development-workflow) +7. [API Usage Patterns](#api-usage-patterns) +8. [Testing Strategies](#testing-strategies) +9. [Build and Deployment](#build-and-deployment) +10. [Troubleshooting Guide](#troubleshooting-guide) +11. [Key Files Reference](#key-files-reference) + +## Quick Start + +Musoq is a SQL-like query engine that can query diverse data sources without requiring a traditional database. It transforms SQL queries into executable C# code through a sophisticated compilation pipeline. + +### Essential Understanding + +```mermaid +graph LR + A[SQL Query] --> B[Parser] --> C[Converter] --> D[Evaluator] --> E[Results] + B --> F[AST] + C --> G[C# Code] + D --> H[Compiled Assembly] +``` + +### Quick Test Commands + +```bash +# Build the solution +dotnet build + +# Run all tests +dotnet test + +# Run specific test project +dotnet test Musoq.Parser.Tests + +# Run with specific filter +dotnet test --filter "TestCategory=Integration" +``` + +## Architecture Overview + +### High-Level Architecture + +Musoq follows a pipeline architecture with clear separation of concerns: + +1. **Parser** (`Musoq.Parser`) - Lexical analysis and AST generation +2. **Schema** (`Musoq.Schema`) - Data source abstraction and type system +3. **Converter** (`Musoq.Converter`) - AST to C# code transformation +4. **Evaluator** (`Musoq.Evaluator`) - Dynamic compilation and execution +5. **Plugins** (`Musoq.Plugins`) - Extensible function library and data sources + +### Data Flow + +``` +SQL Text → Lexer → Tokens → Parser → AST → Converter → C# Code → Compiler → Assembly → Executor → Results +``` + +### Key Design Principles + +- **Extensibility First**: Plugin-based architecture for data sources +- **Performance Focus**: Dynamic compilation for optimal execution +- **Type Safety**: Strong typing throughout the pipeline +- **SQL Compatibility**: Standard SQL syntax with extensions + +## Core Components + +### 1. Parser Module (`Musoq.Parser`) + +**Purpose**: Transforms SQL text into Abstract Syntax Tree (AST) + +**Key Files**: +- `Parser.cs` - Main parsing logic with precedence handling +- `Lexing/Lexer.cs` - Tokenization and lexical analysis +- `Nodes/` - AST node hierarchy +- `Tokens/` - Token definitions + +**Key Concepts**: +```csharp +// Entry point for parsing +var parser = new Parser(lexer); +var rootNode = parser.ComposeAll(); +``` + +**AST Node Types**: +- `SelectNode` - SELECT clauses +- `FromNode` - FROM clauses and data sources +- `WhereNode` - WHERE conditions +- `OrderByNode` - ORDER BY clauses +- `GroupByNode` - GROUP BY clauses + +### 2. Schema System (`Musoq.Schema`) + +**Purpose**: Defines data source contracts and metadata management + +**Key Files**: +- `ISchema.cs` - Main schema interface for plugins +- `DataSources/SchemaBase.cs` - Base class for schema implementations +- `DataSources/RowSource.cs` - Data iteration abstraction +- `RuntimeContext.cs` - Query execution context + +**Plugin Contract**: +```csharp +public interface ISchema +{ + string Name { get; } + ISchemaTable GetTableByName(string name, RuntimeContext context, params object[] parameters); + RowSource GetRowSource(string name, RuntimeContext context, params object[] parameters); + bool TryResolveMethod(string method, Type[] parameters, Type entityType, out MethodInfo methodInfo); +} +``` + +**Data Source Implementation Pattern**: +```csharp +public class MySchema : SchemaBase +{ + public MySchema() : base("myschema", new MethodsAggregator()) + { + AddSource("data"); + } +} +``` + +### 3. Converter Module (`Musoq.Converter`) + +**Purpose**: Transforms AST into executable C# code + +**Key Files**: +- `Build/BuildChain.cs` - Chain of responsibility pattern for transformations +- `Build/CreateTree.cs` - Initial AST processing +- `Build/TranformTree.cs` - Semantic transformations +- `Build/TurnQueryIntoRunnableCode.cs` - Final C# code generation + +**Transformation Pipeline**: +1. **CreateTree**: Initial AST processing and validation +2. **TranformTree**: Apply semantic transformations and optimizations +3. **TurnQueryIntoRunnableCode**: Generate executable C# code + +### 4. Evaluator Module (`Musoq.Evaluator`) + +**Purpose**: Compiles and executes generated code + +**Key Files**: +- `CompiledQuery.cs` - Executable query wrapper +- `IRunnable.cs` - Interface for executable queries +- `Tables/Table.cs` - Result set representation +- `Runtime/` - Runtime function library + +**Execution Flow**: +```csharp +var compiledQuery = new CompiledQuery(runnable); +var results = compiledQuery.Run(); +``` + +### 5. Plugin System (`Musoq.Plugins`) + +**Purpose**: Extensible function library + +**Key Files**: +- `Lib/LibraryBase*.cs` - Standard function implementations +- `Assembly.cs` - Assembly utilities +- `Attributes/` - Plugin metadata attributes + +## Plugin Development + +### Creating a Data Source Plugin + +1. **Create Schema Class**: +```csharp +public class MyDataSchema : SchemaBase +{ + public MyDataSchema() : base("mydata", new MethodsAggregator()) + { + AddSource("source"); + AddTable("table"); + } +} +``` + +2. **Implement Row Source**: +```csharp +public class MyDataSource : RowSourceBase +{ + public MyDataSource(string connectionString, RuntimeContext context) + : base(context) + { + // Initialize data source + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + // Yield data rows + } +} +``` + +3. **Define Entity Type**: +```csharp +public class MyEntity +{ + public string Name { get; set; } + public int Value { get; set; } + public DateTime CreatedAt { get; set; } +} +``` + +4. **Register Schema**: +```csharp +var provider = new SchemaProvider(); +provider.RegisterSchema("mydata", new MyDataSchema()); +``` + +### Function Library Extension + +```csharp +public class MyLibrary : LibraryBase +{ + [BindableMethod] + public string ProcessText(string input, string pattern) + { + // Custom function implementation + return result; + } +} +``` + +## Query Processing Pipeline + +### 1. Lexical Analysis +``` +"SELECT Name FROM #mydata.source('param')" → Tokens +``` + +### 2. Parsing +``` +Tokens → AST (SelectNode + FromNode + ...) +``` + +### 3. Semantic Analysis +``` +AST → Type Inference → Schema Resolution → Optimized AST +``` + +### 4. Code Generation +```csharp +// Generated C# code example +public class Query_12345 : IRunnable +{ + public Table Run(CancellationToken token) + { + var source = schema.GetRowSource("source", context, "param"); + var results = new List(); + + foreach(var row in source.GetRows(token)) + { + results.Add(new object[] { row.Name }); + } + + return new Table(results); + } +} +``` + +### 5. Compilation and Execution +``` +C# Code → Dynamic Compilation → Assembly → Execution → Results +``` + +## Development Workflow + +### Setting Up Development Environment + +1. **Prerequisites**: + - .NET 8.0 SDK + - Visual Studio or VS Code + - Git + +2. **Clone and Build**: +```bash +git clone https://github.com/Puchaczov/Musoq.git +cd Musoq +dotnet restore +dotnet build +``` + +3. **Run Tests**: +```bash +dotnet test --verbosity normal +``` + +### Project Structure + +``` +Musoq/ +├── Musoq.Parser/ # SQL parsing and AST generation +├── Musoq.Schema/ # Data source abstraction +├── Musoq.Converter/ # AST to C# transformation +├── Musoq.Evaluator/ # Code compilation and execution +├── Musoq.Plugins/ # Standard function library +├── *.Tests/ # Unit and integration tests +├── Musoq.Benchmarks/ # Performance benchmarks +└── docs/ # Documentation +``` + +### Making Changes + +1. **Component-Specific Changes**: + - Parser: Modify lexing/parsing logic + - Schema: Add new data source types + - Converter: Modify code generation + - Evaluator: Change execution behavior + - Plugins: Add new functions + +2. **Testing Strategy**: + - Unit tests for individual components + - Integration tests for query execution + - Performance tests for optimization + +3. **Build and Test Loop**: +```bash +# Make changes +dotnet build +dotnet test --filter "TestCategory=Unit" +dotnet test --filter "TestCategory=Integration" +``` + +## API Usage Patterns + +### Basic Query Execution + +```csharp +// Using test infrastructure pattern +var schemaProvider = new BasicSchemaProvider(dataSources); +var compiledQuery = InstanceCreator.CompileForExecution( + "SELECT Name, Value FROM #schema.table('param')", + Guid.NewGuid().ToString(), + schemaProvider, + loggerResolver); + +var results = compiledQuery.Run(); +``` + +### Advanced Query with Multiple Sources + +```csharp +var query = @" + SELECT a.Name, b.Value + FROM #source1.data() a + INNER JOIN #source2.data() b ON a.Id = b.Id + WHERE a.CreatedAt > '2023-01-01' + ORDER BY a.Name"; + +var compiledQuery = InstanceCreator.CompileForExecution(query, ...); +var table = compiledQuery.Run(); +``` + +### Custom Schema Registration + +```csharp +var provider = new SchemaProvider(); +provider.RegisterSchema("mydata", new MyDataSchema()); +provider.RegisterSchema("files", new FileSystemSchema()); + +var compiledQuery = InstanceCreator.CompileForExecution(query, ...); +``` + +## Testing Strategies + +### Unit Testing Pattern + +```csharp +[TestClass] +public class MyFeatureTests : BasicEntityTestBase +{ + [TestMethod] + public void Should_Parse_Complex_Query() + { + // Arrange + var query = "SELECT Name FROM #test.data()"; + + // Act + var buildItems = CreateBuildItems(query); + + // Assert + Assert.IsNotNull(buildItems); + // Additional assertions + } +} +``` + +### Integration Testing Pattern + +```csharp +[TestMethod] +public void Should_Execute_Query_With_Results() +{ + // Arrange + var data = new Dictionary> + { + ["entities"] = new[] + { + new BasicEntity { Name = "Test", Value = 1 } + } + }; + + // Act + var vm = CreateAndRunVirtualMachine("SELECT Name FROM #basic.entities()", data); + var table = vm.Run(); + + // Assert + Assert.AreEqual(1, table.Count); + Assert.AreEqual("Test", table[0][0]); +} +``` + +### Test Data Setup + +```csharp +protected static BasicEntity[] CreateBasicEntities() +{ + return new[] + { + new BasicEntity { Id = 1, Name = "Entity1", Value = 100 }, + new BasicEntity { Id = 2, Name = "Entity2", Value = 200 } + }; +} +``` + +## Build and Deployment + +### Build Configuration + +- **Target Framework**: .NET 8.0 +- **C# Version**: Latest +- **NuGet Packages**: Managed via PackageReference + +### Build Commands + +```bash +# Clean build +dotnet clean +dotnet restore +dotnet build --configuration Release + +# Pack NuGet packages +dotnet pack --configuration Release + +# Run benchmarks +dotnet run --project Musoq.Benchmarks --configuration Release +``` + +### Project Dependencies + +``` +Musoq.Evaluator +├── Musoq.Converter +├── Musoq.Schema +├── Musoq.Parser +└── Musoq.Plugins + +Musoq.Converter +├── Musoq.Schema +└── Musoq.Parser + +Musoq.Schema +└── (Base dependencies only) + +Musoq.Parser +└── (Base dependencies only) +``` + +### NuGet Publishing + +Each component is published as a separate NuGet package: +- `Musoq.Parser` - SQL parsing library +- `Musoq.Schema` - Schema abstraction +- `Musoq.Converter` - Code generation +- `Musoq.Evaluator` - Query execution +- `Musoq.Plugins` - Standard functions + +## Troubleshooting Guide + +### Common Issues + +1. **Parse Errors**: + - Check SQL syntax + - Verify token definitions in `Tokens/` + - Debug lexer output + +2. **Schema Resolution Failures**: + - Verify schema registration + - Check method signatures + - Validate parameter types + +3. **Compilation Errors**: + - Review generated C# code + - Check type inference + - Validate method resolution + +4. **Runtime Errors**: + - Check data source connectivity + - Validate parameter values + - Review cancellation token usage + +### Debugging Techniques + +1. **AST Inspection**: +```csharp +var parser = new Parser(lexer); +var rootNode = parser.ComposeAll(); +// Set breakpoint and inspect AST structure +``` + +2. **Generated Code Inspection**: +```csharp +var buildItems = InstanceCreator.CreateForAnalyze(query, ...); +// Inspect generated C# code in buildItems +``` + +3. **Query Execution Tracing**: +```csharp +var compiledQuery = InstanceCreator.CompileForExecution(query, ...); +// Enable logging to trace execution +var results = compiledQuery.Run(); +``` + +### Performance Analysis + +1. **Benchmark Tests**: +```bash +dotnet run --project Musoq.Benchmarks --configuration Release +``` + +2. **Memory Profiling**: + - Use dotMemory or PerfView + - Focus on row enumeration + - Check for memory leaks in data sources + +3. **Query Optimization**: + - Review generated C# code + - Optimize data source implementations + - Consider parallel execution + +## Key Files Reference + +### Parser Module +- `Parser.cs` - Main parsing logic +- `Lexing/Lexer.cs` - Tokenization +- `Nodes/QueryNode.cs` - Base AST node +- `Tokens/TokenType.cs` - Token definitions + +### Schema Module +- `ISchema.cs` - Plugin interface +- `DataSources/SchemaBase.cs` - Base implementation +- `DataSources/RowSource.cs` - Data iteration +- `RuntimeContext.cs` - Execution context + +### Converter Module +- `Build/BuildChain.cs` - Transformation pipeline +- `Build/TurnQueryIntoRunnableCode.cs` - Code generation +- `InstanceCreator.cs` - Main API entry point + +### Evaluator Module +- `CompiledQuery.cs` - Query execution +- `IRunnable.cs` - Execution interface +- `Tables/Table.cs` - Result representation + +### Plugin Module +- `Lib/LibraryBase.cs` - Function base class +- `Assembly.cs` - Assembly utilities +- `Attributes/BindableMethodAttribute.cs` - Method binding + +### Test Infrastructure +- `Musoq.Tests.Common/Culture.cs` - Test culture setup +- `Musoq.Evaluator.Tests/Schema/Basic/BasicEntityTestBase.cs` - Test base +- `Musoq.Evaluator.Tests/Components/` - Test utilities + +--- + +This documentation provides a comprehensive guide for AI agents working with the Musoq codebase. It covers architecture, development patterns, testing strategies, and practical examples for understanding and extending the query engine. \ No newline at end of file diff --git a/.copilot/api-usage-examples.md b/.copilot/api-usage-examples.md new file mode 100644 index 00000000..2c6b782e --- /dev/null +++ b/.copilot/api-usage-examples.md @@ -0,0 +1,690 @@ +# API Usage and Examples + +## Core API Overview + +Musoq provides several entry points for different use cases. This document covers the main APIs and usage patterns. + +## Primary API Entry Points + +### 1. InstanceCreator - Main API + +The `InstanceCreator` class provides the primary interface for compiling and executing queries. + +```csharp +using Musoq.Converter; +using Musoq.Evaluator; +using Musoq.Schema; + +// Basic query compilation and execution +var query = "SELECT Name, Count(*) FROM #schema.source() GROUP BY Name"; +var schemaProvider = new MySchemaProvider(); +var loggerResolver = new LoggerResolver(); + +var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + schemaProvider, + loggerResolver); + +var results = compiledQuery.Run(); +``` + +### 2. Analysis API + +For analyzing queries without execution: + +```csharp +var buildItems = InstanceCreator.CreateForAnalyze( + query, + Guid.NewGuid().ToString(), + schemaProvider, + loggerResolver); + +// Access generated C# code +var generatedCode = buildItems.BuildedCode; + +// Access AST +var rootNode = buildItems.RawQuery; + +// Access query metadata +var queryInformation = buildItems.QueryInformation; +``` + +## Schema Provider Implementation + +### Basic Schema Provider + +```csharp +public class CustomSchemaProvider : ISchemaProvider +{ + private readonly Dictionary _schemas = new(); + + public CustomSchemaProvider() + { + RegisterDefaultSchemas(); + } + + public void RegisterSchema(string name, ISchema schema) + { + _schemas[name.ToLowerInvariant()] = schema; + } + + public ISchema GetSchema(string name) + { + return _schemas.TryGetValue(name.ToLowerInvariant(), out var schema) ? schema : null; + } + + private void RegisterDefaultSchemas() + { + RegisterSchema("system", new SystemSchema()); + RegisterSchema("files", new FileSystemSchema()); + RegisterSchema("memory", new InMemorySchema()); + } +} +``` + +### Multi-Schema Provider with Dynamic Loading + +```csharp +public class DynamicSchemaProvider : ISchemaProvider +{ + private readonly Dictionary _schemas = new(); + private readonly List _pluginAssemblies = new(); + + public void LoadSchemasFromAssembly(Assembly assembly) + { + _pluginAssemblies.Add(assembly); + + var schemaTypes = assembly.GetTypes() + .Where(t => typeof(ISchema).IsAssignableFrom(t) && !t.IsAbstract); + + foreach (var schemaType in schemaTypes) + { + var schema = (ISchema)Activator.CreateInstance(schemaType); + RegisterSchema(schema.Name, schema); + } + } + + public void LoadSchemasFromDirectory(string pluginDirectory) + { + var assemblies = Directory.GetFiles(pluginDirectory, "*.dll") + .Select(Assembly.LoadFrom); + + foreach (var assembly in assemblies) + { + LoadSchemasFromAssembly(assembly); + } + } +} +``` + +## Data Source Implementation Examples + +### 1. In-Memory Data Source + +```csharp +public class InMemorySchema : SchemaBase +{ + private readonly Dictionary> _dataSources = new(); + + public InMemorySchema() : base("memory", new MethodsAggregator()) + { + AddSource("table"); + } + + public void AddDataSource(string name, IEnumerable data) + { + _dataSources[name.ToLowerInvariant()] = data; + } + + public override RowSource GetRowSource(string name, RuntimeContext runtimeContext, params object[] parameters) + { + var tableName = parameters.Length > 0 ? parameters[0]?.ToString() : name; + + if (_dataSources.TryGetValue(tableName.ToLowerInvariant(), out var data)) + { + return new InMemoryRowSource(data, runtimeContext); + } + + throw new ArgumentException($"Data source '{tableName}' not found"); + } +} + +public class InMemoryRowSource : RowSourceBase +{ + private readonly IEnumerable _data; + + public InMemoryRowSource(IEnumerable data, RuntimeContext context) + : base(context) + { + _data = data; + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + foreach (var item in _data) + { + cancellationToken.ThrowIfCancellationRequested(); + yield return item; + } + } +} +``` + +### 2. REST API Data Source + +```csharp +public class RestApiSchema : SchemaBase +{ + private static readonly HttpClient _httpClient = new(); + + public RestApiSchema() : base("api", new MethodsAggregator()) + { + AddSource("endpoint"); + } + + public override RowSource GetRowSource(string name, RuntimeContext runtimeContext, params object[] parameters) + { + if (name == "endpoint" && parameters.Length > 0) + { + var url = parameters[0]?.ToString(); + var headers = parameters.Length > 1 ? parameters[1] as Dictionary : null; + + return new RestApiRowSource(url, headers, runtimeContext); + } + + throw new ArgumentException($"Invalid parameters for '{name}'"); + } +} + +public class RestApiRowSource : RowSourceBase +{ + private readonly string _url; + private readonly Dictionary _headers; + + public RestApiRowSource(string url, Dictionary headers, RuntimeContext context) + : base(context) + { + _url = url; + _headers = headers ?? new Dictionary(); + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + var request = new HttpRequestMessage(HttpMethod.Get, _url); + + foreach (var header in _headers) + { + request.Headers.Add(header.Key, header.Value); + } + + var response = _httpClient.SendAsync(request, cancellationToken).Result; + response.EnsureSuccessStatusCode(); + + var jsonContent = response.Content.ReadAsStringAsync().Result; + var jsonDocument = JsonDocument.Parse(jsonContent); + + if (jsonDocument.RootElement.ValueKind == JsonValueKind.Array) + { + foreach (var element in jsonDocument.RootElement.EnumerateArray()) + { + cancellationToken.ThrowIfCancellationRequested(); + yield return JsonElementToDynamic(element); + } + } + else + { + yield return JsonElementToDynamic(jsonDocument.RootElement); + } + } + + private dynamic JsonElementToDynamic(JsonElement element) + { + var expando = new ExpandoObject() as IDictionary; + + foreach (var property in element.EnumerateObject()) + { + expando[property.Name] = property.Value.ValueKind switch + { + JsonValueKind.String => property.Value.GetString(), + JsonValueKind.Number => property.Value.GetDecimal(), + JsonValueKind.True => true, + JsonValueKind.False => false, + JsonValueKind.Null => null, + _ => property.Value.ToString() + }; + } + + return expando; + } +} +``` + +### 3. Database Data Source + +```csharp +public class DatabaseSchema : SchemaBase +{ + public DatabaseSchema() : base("db", new MethodsAggregator()) + { + AddSource("query"); + AddSource("table"); + } + + public override RowSource GetRowSource(string name, RuntimeContext runtimeContext, params object[] parameters) + { + return name switch + { + "query" => new DatabaseRowSource( + parameters[0]?.ToString(), // connection string + parameters[1]?.ToString(), // SQL query + parameters.Skip(2).ToArray(), // parameters + runtimeContext), + "table" => new DatabaseRowSource( + parameters[0]?.ToString(), // connection string + $"SELECT * FROM {parameters[1]}", // table name + new object[0], + runtimeContext), + _ => throw new ArgumentException($"Unknown source: {name}") + }; + } +} + +public class DatabaseRowSource : RowSourceBase +{ + private readonly string _connectionString; + private readonly string _query; + private readonly object[] _parameters; + + public DatabaseRowSource(string connectionString, string query, object[] parameters, RuntimeContext context) + : base(context) + { + _connectionString = connectionString; + _query = query; + _parameters = parameters; + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + using var connection = new SqlConnection(_connectionString); + connection.Open(); + + using var command = new SqlCommand(_query, connection); + + for (int i = 0; i < _parameters.Length; i++) + { + command.Parameters.AddWithValue($"@p{i}", _parameters[i] ?? DBNull.Value); + } + + using var reader = command.ExecuteReader(); + + while (reader.Read()) + { + cancellationToken.ThrowIfCancellationRequested(); + + var row = new ExpandoObject() as IDictionary; + + for (int i = 0; i < reader.FieldCount; i++) + { + var fieldName = reader.GetName(i); + var fieldValue = reader.IsDBNull(i) ? null : reader.GetValue(i); + row[fieldName] = fieldValue; + } + + yield return row; + } + } +} +``` + +## Advanced Usage Patterns + +### 1. Query Parameterization + +```csharp +public class ParameterizedQueryExample +{ + public Table ExecuteParameterizedQuery(string userInput, DateTime fromDate, DateTime toDate) + { + // Use parameters to prevent injection and improve reusability + var query = @" + SELECT Name, CreatedAt, Status + FROM #system.events() + WHERE Name LIKE @userPattern + AND CreatedAt BETWEEN @fromDate AND @toDate + ORDER BY CreatedAt DESC"; + + var schemaProvider = new CustomSchemaProvider(); + + // Create a parameterized context + var context = new RuntimeContext(); + context.SetParameter("userPattern", $"%{userInput}%"); + context.SetParameter("fromDate", fromDate); + context.SetParameter("toDate", toDate); + + var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + schemaProvider, + new LoggerResolver()); + + return compiledQuery.Run(); + } +} +``` + +### 2. Dynamic Schema Registration + +```csharp +public class DynamicQueryEngine +{ + private readonly DynamicSchemaProvider _schemaProvider; + + public DynamicQueryEngine() + { + _schemaProvider = new DynamicSchemaProvider(); + } + + public void RegisterDataSource(string schemaName, string sourceName, IEnumerable data) + { + var schema = new GenericInMemorySchema(schemaName, sourceName, data); + _schemaProvider.RegisterSchema(schemaName, schema); + } + + public Table ExecuteQuery(string query) + { + var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + _schemaProvider, + new LoggerResolver()); + + return compiledQuery.Run(); + } +} + +// Usage +var engine = new DynamicQueryEngine(); + +// Register different data sources +engine.RegisterDataSource("sales", "orders", salesData); +engine.RegisterDataSource("inventory", "products", productData); +engine.RegisterDataSource("users", "customers", customerData); + +// Execute complex queries across multiple sources +var results = engine.ExecuteQuery(@" + SELECT + o.OrderId, + c.CustomerName, + p.ProductName, + o.Quantity * p.Price as TotalValue + FROM #sales.orders() o + INNER JOIN #users.customers() c ON o.CustomerId = c.Id + INNER JOIN #inventory.products() p ON o.ProductId = p.Id + WHERE o.OrderDate >= '2023-01-01' + ORDER BY TotalValue DESC"); +``` + +### 3. Streaming Large Datasets + +```csharp +public class StreamingQueryProcessor +{ + public async IAsyncEnumerable ExecuteStreamingQuery( + string query, + ISchemaProvider schemaProvider, + [EnumeratorCancellation] CancellationToken cancellationToken = default) + { + var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + schemaProvider, + new LoggerResolver()); + + // Execute query and stream results + var table = await compiledQuery.RunAsync(cancellationToken); + + foreach (var row in table) + { + cancellationToken.ThrowIfCancellationRequested(); + yield return row; + } + } +} + +// Usage +var processor = new StreamingQueryProcessor(); + +await foreach (var row in processor.ExecuteStreamingQuery( + "SELECT * FROM #large.dataset() WHERE Status = 'Active'", + schemaProvider, + cancellationToken)) +{ + // Process row without loading entire result set into memory + ProcessRow(row); +} +``` + +### 4. Query Analysis and Optimization + +```csharp +public class QueryAnalyzer +{ + public QueryAnalysisResult AnalyzeQuery(string query, ISchemaProvider schemaProvider) + { + var buildItems = InstanceCreator.CreateForAnalyze( + query, + Guid.NewGuid().ToString(), + schemaProvider, + new LoggerResolver()); + + var result = new QueryAnalysisResult + { + IsValid = buildItems.IsValid, + GeneratedCode = buildItems.BuildedCode, + UsedSchemas = ExtractUsedSchemas(buildItems), + EstimatedComplexity = CalculateComplexity(buildItems), + OptimizationSuggestions = GenerateOptimizationSuggestions(buildItems) + }; + + return result; + } + + private string[] ExtractUsedSchemas(BuildItems buildItems) + { + return buildItems.QueryInformation.Values + .Select(qi => qi.FromNode) + .OfType() + .Select(sfn => sfn.Schema) + .Distinct() + .ToArray(); + } + + private int CalculateComplexity(BuildItems buildItems) + { + // Analyze AST for complexity indicators + var complexity = 0; + + // Count JOINs + complexity += CountJoins(buildItems.RawQuery) * 2; + + // Count subqueries + complexity += CountSubqueries(buildItems.RawQuery) * 3; + + // Count aggregations + complexity += CountAggregations(buildItems.RawQuery); + + return complexity; + } + + private string[] GenerateOptimizationSuggestions(BuildItems buildItems) + { + var suggestions = new List(); + + // Analyze for common optimization opportunities + if (HasUnnecessarySelectStar(buildItems.RawQuery)) + { + suggestions.Add("Consider selecting only required columns instead of SELECT *"); + } + + if (HasMissingWhereClause(buildItems.RawQuery)) + { + suggestions.Add("Consider adding WHERE clause to filter data early"); + } + + return suggestions.ToArray(); + } +} + +public class QueryAnalysisResult +{ + public bool IsValid { get; set; } + public string GeneratedCode { get; set; } + public string[] UsedSchemas { get; set; } + public int EstimatedComplexity { get; set; } + public string[] OptimizationSuggestions { get; set; } +} +``` + +### 5. Error Handling and Logging + +```csharp +public class RobustQueryExecutor +{ + private readonly ILogger _logger; + private readonly ISchemaProvider _schemaProvider; + + public RobustQueryExecutor(ISchemaProvider schemaProvider, ILogger logger) + { + _schemaProvider = schemaProvider; + _logger = logger; + } + + public async Task ExecuteQuerySafelyAsync( + string query, + TimeSpan timeout = default, + CancellationToken cancellationToken = default) + { + var result = new QueryExecutionResult { Query = query }; + + try + { + _logger.LogInformation("Starting query execution: {Query}", query); + + var stopwatch = Stopwatch.StartNew(); + + // Set up timeout + using var timeoutCts = timeout != default + ? new CancellationTokenSource(timeout) + : new CancellationTokenSource(); + + using var combinedCts = CancellationTokenSource.CreateLinkedTokenSource( + cancellationToken, timeoutCts.Token); + + // Compile query + var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + _schemaProvider, + new LoggerResolver()); + + result.CompilationTimeMs = stopwatch.ElapsedMilliseconds; + + // Execute query + var table = await compiledQuery.RunAsync(combinedCts.Token); + + stopwatch.Stop(); + + result.IsSuccess = true; + result.Results = table; + result.ExecutionTimeMs = stopwatch.ElapsedMilliseconds - result.CompilationTimeMs; + result.TotalTimeMs = stopwatch.ElapsedMilliseconds; + result.RowCount = table.Count; + + _logger.LogInformation( + "Query executed successfully in {TotalTime}ms, returned {RowCount} rows", + result.TotalTimeMs, + result.RowCount); + } + catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) + { + result.IsSuccess = false; + result.ErrorMessage = "Query execution was cancelled"; + _logger.LogWarning("Query execution was cancelled: {Query}", query); + } + catch (OperationCanceledException) when (timeout != default) + { + result.IsSuccess = false; + result.ErrorMessage = $"Query execution timed out after {timeout.TotalSeconds} seconds"; + _logger.LogWarning("Query execution timed out: {Query}", query); + } + catch (Exception ex) + { + result.IsSuccess = false; + result.ErrorMessage = ex.Message; + result.Exception = ex; + _logger.LogError(ex, "Query execution failed: {Query}", query); + } + + return result; + } +} + +public class QueryExecutionResult +{ + public string Query { get; set; } + public bool IsSuccess { get; set; } + public Table Results { get; set; } + public string ErrorMessage { get; set; } + public Exception Exception { get; set; } + public long CompilationTimeMs { get; set; } + public long ExecutionTimeMs { get; set; } + public long TotalTimeMs { get; set; } + public int RowCount { get; set; } +} +``` + +## Testing API Usage + +### Unit Testing with Mock Data + +```csharp +[TestClass] +public class ApiUsageTests +{ + [TestMethod] + public async Task Should_Execute_Query_With_Mock_Data() + { + // Arrange + var testData = new[] + { + new { Name = "Alice", Age = 30, Department = "Engineering" }, + new { Name = "Bob", Age = 25, Department = "Marketing" }, + new { Name = "Charlie", Age = 35, Department = "Engineering" } + }; + + var schemaProvider = new CustomSchemaProvider(); + var schema = new InMemorySchema(); + schema.AddDataSource("employees", testData); + schemaProvider.RegisterSchema("test", schema); + + var query = @" + SELECT Department, COUNT(*) as EmployeeCount, AVG(Age) as AverageAge + FROM #test.table('employees') + GROUP BY Department + ORDER BY EmployeeCount DESC"; + + // Act + var executor = new RobustQueryExecutor(schemaProvider, Mock.Of()); + var result = await executor.ExecuteQuerySafelyAsync(query); + + // Assert + Assert.IsTrue(result.IsSuccess); + Assert.AreEqual(2, result.Results.Count); // 2 departments + Assert.IsTrue(result.Results.Columns.Any(c => c.ColumnName == "Department")); + Assert.IsTrue(result.Results.Columns.Any(c => c.ColumnName == "EmployeeCount")); + } +} +``` + +This comprehensive API guide covers the main usage patterns and provides practical examples for integrating Musoq into applications. \ No newline at end of file diff --git a/.copilot/architecture-deep-dive.md b/.copilot/architecture-deep-dive.md new file mode 100644 index 00000000..5bc06792 --- /dev/null +++ b/.copilot/architecture-deep-dive.md @@ -0,0 +1,367 @@ +# Core Architecture Deep Dive + +## Overview + +Musoq's architecture is designed around a pipeline pattern where each component has a specific responsibility in the query processing workflow. This document provides detailed insights into each component's internal architecture. + +## Component Architecture + +### 1. Parser Module Architecture + +``` +Lexer → Token Stream → Parser → AST → Validation +``` + +#### Key Classes and Responsibilities + +**`Parser.cs`**: +- Main entry point for parsing operations +- Implements recursive descent parsing with operator precedence +- Manages token consumption and AST node creation +- Handles precedence dictionary for arithmetic operations + +```csharp +private readonly Dictionary _precedenceDictionary = new() +{ + {TokenType.Plus, (1, Associativity.Left)}, + {TokenType.Hyphen, (1, Associativity.Left)}, + {TokenType.Star, (2, Associativity.Left)}, + {TokenType.FSlash, (2, Associativity.Left)}, + {TokenType.Mod, (2, Associativity.Left)}, + {TokenType.Dot, (3, Associativity.Left)} +}; +``` + +**`Lexing/Lexer.cs`**: +- Tokenizes input SQL text +- Recognizes keywords, operators, literals, and identifiers +- Handles string escaping and numeric literals +- Manages token position tracking for error reporting + +**AST Node Hierarchy**: +``` +Node (abstract base) +├── QueryNode (abstract) +│ ├── SelectNode +│ ├── WhereNode +│ ├── OrderByNode +│ └── GroupByNode +├── ExpressionNode (abstract) +│ ├── ArithmeticNode +│ ├── ColumnNode +│ ├── LiteralNode +│ └── MethodNode +└── FromNode (abstract) + ├── SchemaFromNode + ├── JoinFromNode + └── CteFromNode +``` + +#### Parser Features + +1. **SQL Syntax Support**: + - Standard SELECT, FROM, WHERE, ORDER BY, GROUP BY + - JOIN operations (INNER, LEFT, RIGHT, FULL OUTER) + - Subqueries and CTEs (Common Table Expressions) + - Set operations (UNION, EXCEPT, INTERSECT) + - Window functions and aggregations + +2. **Extended Syntax**: + - Schema-prefixed data sources (`#schema.table()`) + - CROSS APPLY and OUTER APPLY operations + - Custom function calls + - Dynamic parameter passing + +3. **Error Handling**: + - Syntax error reporting with position information + - Graceful recovery from parse errors + - Detailed error messages for debugging + +### 2. Schema Module Architecture + +``` +ISchema Interface → SchemaBase → Concrete Schema Implementation + ↓ +RowSource → Data Iteration → Entity Objects +``` + +#### Core Abstractions + +**`ISchema`**: +- Primary contract for data source plugins +- Defines methods for table/source resolution +- Handles method binding for data source operations + +**`SchemaBase`**: +- Abstract base class for schema implementations +- Provides common functionality for source/table registration +- Manages constructor information and method aggregation +- Implements default behaviors for schema operations + +```csharp +public abstract class SchemaBase : ISchema +{ + protected SchemaBase(string name, MethodsAggregator methodsAggregator) + { + Name = name; + _aggregator = methodsAggregator; + AddSource("empty"); + AddTable("empty"); + } + + public void AddSource(string name, params object[] args) + { + var sourceName = $"{name.ToLowerInvariant()}{SourcePart}"; + AddToConstructors(sourceName); + AdditionalArguments.Add(sourceName, args); + } +} +``` + +**`RowSource`**: +- Abstract base for data iteration +- Provides cancellation token support +- Manages data streaming and buffering +- Supports chunked data processing + +#### Data Flow Architecture + +1. **Schema Resolution**: + - Query analyzer identifies schema references + - Schema provider resolves schema by name + - Schema creates appropriate table/source instances + +2. **Data Source Initialization**: + - Parameters passed to constructors + - Connection establishment + - Metadata collection and validation + +3. **Data Iteration**: + - RowSource implements IEnumerable + - Lazy evaluation for memory efficiency + - Cancellation token monitoring + - Error handling and resource cleanup + +### 3. Converter Module Architecture + +``` +AST → BuildChain Pipeline → C# Code Generation → Compilation Ready Code +``` + +#### Build Chain Pattern + +The converter uses a chain of responsibility pattern for AST transformations: + +**`BuildChain.cs`**: +```csharp +public abstract class BuildChain(BuildChain successor) +{ + protected readonly BuildChain Successor = successor; + public abstract void Build(BuildItems items); +} +``` + +**Pipeline Stages**: + +1. **`CreateTree`**: + - Initial AST processing + - Symbol table creation + - Type information gathering + - Schema validation + +2. **`TranformTree`**: + - AST transformations and optimizations + - Expression rewriting + - Join optimization + - Predicate pushdown + +3. **`TurnQueryIntoRunnableCode`**: + - C# code generation + - Method binding resolution + - Runtime library integration + - Assembly preparation + +#### Code Generation Strategy + +1. **Template-Based Generation**: + - Uses string templates for code structure + - Dynamic method insertion + - Type-safe code generation + +2. **Runtime Integration**: + - Links with Musoq.Plugins library + - Provides access to standard functions + - Manages schema provider integration + +3. **Optimization Techniques**: + - Expression tree optimization + - Dead code elimination + - Loop unrolling for simple operations + +### 4. Evaluator Module Architecture + +``` +Generated Code → Dynamic Compilation → Assembly → IRunnable → Execution → Table Results +``` + +#### Compilation Process + +**`CompiledQuery.cs`**: +- Wraps IRunnable instances +- Provides synchronous execution interface +- Manages cancellation tokens +- Handles execution context + +**Dynamic Compilation Features**: +1. **In-Memory Compilation**: + - Uses Roslyn compiler + - No file I/O required + - Fast compilation for simple queries + +2. **Assembly Management**: + - Temporary assembly creation + - Memory-efficient disposal + - Debugging support + +3. **Execution Context**: + - Thread-safe execution + - Cancellation token propagation + - Error handling and reporting + +#### Result Management + +**`Tables/Table.cs`**: +- Represents query results +- Supports indexing and enumeration +- Provides column metadata +- Memory-efficient storage + +### 5. Plugin System Architecture + +``` +LibraryBase → Standard Functions → Aggregations → Custom Methods + ↓ +BindableMethodAttribute → Method Resolution → Runtime Binding +``` + +#### Function Registration + +**`LibraryBase`**: +- Base class for function libraries +- Provides standard implementations +- Supports method overloading +- Handles type conversions + +**Method Binding Process**: +1. **Attribute Discovery**: + - Scans for `[BindableMethod]` attributes + - Collects method metadata + - Validates method signatures + +2. **Type Resolution**: + - Matches parameter types + - Handles generic methods + - Supports nullable types + +3. **Runtime Invocation**: + - Dynamic method calls + - Parameter conversion + - Exception handling + +#### Standard Library Organization + +**Function Categories**: +- **String Functions** (`LibraryBaseStrings.cs`) +- **Math Functions** (`LibraryBaseMath.cs`) +- **Date/Time Functions** (`LibraryBaseDate.cs`) +- **Conversion Functions** (`LibraryBaseConverting.cs`) +- **Aggregation Functions** (`LibraryBaseSum.cs`, `LibraryBaseCount.cs`, etc.) + +## Integration Patterns + +### Schema Provider Integration + +```csharp +public class CustomSchemaProvider : ISchemaProvider +{ + private readonly Dictionary _schemas = new(); + + public void RegisterSchema(string name, ISchema schema) + { + _schemas[name.ToLowerInvariant()] = schema; + } + + public ISchema GetSchema(string name) + { + return _schemas.TryGetValue(name.ToLowerInvariant(), out var schema) ? schema : null; + } +} +``` + +### Query Execution Pipeline + +```csharp +// 1. Parse Query +var lexer = new Lexer(sqlText); +var parser = new Parser(lexer); +var rootNode = parser.ComposeAll(); + +// 2. Build Execution Plan +var buildItems = new BuildItems(rootNode, schemaProvider, loggerResolver); +var buildChain = new CreateTree(new TranformTree(new TurnQueryIntoRunnableCode(null))); +buildChain.Build(buildItems); + +// 3. Compile and Execute +var compiledQuery = new CompiledQuery(buildItems.Runnable); +var results = compiledQuery.Run(); +``` + +### Error Handling Strategy + +1. **Parse-Time Errors**: + - Syntax validation + - Token recognition failures + - Grammar rule violations + +2. **Build-Time Errors**: + - Schema resolution failures + - Type checking errors + - Method binding failures + +3. **Runtime Errors**: + - Data source connection issues + - Type conversion failures + - Resource exhaustion + +## Performance Considerations + +### Memory Management + +1. **Lazy Evaluation**: + - Deferred query execution + - Streaming data processing + - Minimal memory footprint + +2. **Resource Cleanup**: + - Automatic disposal of data sources + - Cancellation token monitoring + - Memory leak prevention + +### Optimization Opportunities + +1. **Query Planning**: + - Predicate pushdown to data sources + - Join order optimization + - Index usage hints + +2. **Code Generation**: + - Expression tree optimization + - Inlined method calls + - Type-specific optimizations + +3. **Parallel Execution**: + - Multi-threaded data processing + - Parallel aggregations + - Async data source support + +This architecture enables Musoq to be both flexible and performant, supporting diverse data sources while maintaining type safety and SQL compatibility. \ No newline at end of file diff --git a/.copilot/development-debugging-guide.md b/.copilot/development-debugging-guide.md new file mode 100644 index 00000000..d05ef80f --- /dev/null +++ b/.copilot/development-debugging-guide.md @@ -0,0 +1,682 @@ +# Development and Debugging Guide + +## Development Environment Setup + +### Prerequisites + +```bash +# Required software +- .NET 8.0 SDK or later +- Git +- Visual Studio 2022 or VS Code with C# extension +- Optional: dotMemory, PerfView for performance analysis +``` + +### Local Development Setup + +```bash +# Clone the repository +git clone https://github.com/Puchaczov/Musoq.git +cd Musoq + +# Restore dependencies +dotnet restore + +# Build the solution +dotnet build + +# Run tests to verify setup +dotnet test --verbosity normal +``` + +### IDE Configuration + +#### Visual Studio 2022 +- Enable nullable reference types warnings +- Configure code analysis rules +- Set up debugging for dynamic assemblies + +#### VS Code +```json +// .vscode/settings.json +{ + "dotnet.defaultSolution": "Musoq.sln", + "omnisharp.enableRoslynAnalyzers": true, + "csharp.semanticHighlighting.enabled": true +} +``` + +## Development Workflow + +### Component Development Cycle + +```mermaid +graph LR + A[Identify Component] --> B[Write Tests] + B --> C[Implement Feature] + C --> D[Run Unit Tests] + D --> E[Run Integration Tests] + E --> F[Performance Testing] + F --> G[Code Review] + G --> H[Merge] +``` + +### Making Changes + +#### 1. Parser Module Changes + +When modifying SQL parsing logic: + +```bash +# Run parser-specific tests +dotnet test Musoq.Parser.Tests --verbosity detailed + +# Test with real queries +dotnet test Musoq.Evaluator.Tests --filter "TestCategory=Parser" +``` + +**Key areas to test:** +- Token recognition for new keywords +- AST node generation +- Operator precedence +- Error reporting + +#### 2. Schema Module Changes + +When adding new schema features: + +```bash +# Test schema functionality +dotnet test Musoq.Schema.Tests + +# Test integration with evaluator +dotnet test Musoq.Evaluator.Tests --filter "TestCategory=Schema" +``` + +**Key considerations:** +- Method resolution +- Type inference +- Runtime context handling +- Performance implications + +#### 3. Converter Module Changes + +When modifying code generation: + +```bash +# Test code generation +dotnet test Musoq.Converter.Tests + +# Verify generated code compiles +dotnet test Musoq.Evaluator.Tests --filter "TestCategory=CodeGeneration" +``` + +**Debugging generated code:** +```csharp +// Enable code inspection in tests +var buildItems = InstanceCreator.CreateForAnalyze(query, ...); +var generatedCode = buildItems.BuildedCode; +Console.WriteLine(generatedCode); // Inspect generated C# +``` + +#### 4. Evaluator Module Changes + +When modifying execution logic: + +```bash +# Test query execution +dotnet test Musoq.Evaluator.Tests + +# Run performance benchmarks +dotnet run --project Musoq.Benchmarks --configuration Release +``` + +## Debugging Techniques + +### 1. AST Debugging + +```csharp +// Debug parser output +public void DebugParseTree() +{ + var lexer = new Lexer("SELECT Name FROM #test.data()"); + var parser = new Parser(lexer); + var rootNode = parser.ComposeAll(); + + // Set breakpoint and inspect AST structure + PrintAST(rootNode, 0); +} + +private void PrintAST(Node node, int depth) +{ + var indent = new string(' ', depth * 2); + Console.WriteLine($"{indent}{node.GetType().Name}"); + + foreach (var child in node.Children) + { + PrintAST(child, depth + 1); + } +} +``` + +### 2. Code Generation Debugging + +```csharp +// Inspect generated C# code +public void DebugCodeGeneration() +{ + var query = "SELECT Name, Count(*) FROM #test.data() GROUP BY Name"; + var buildItems = InstanceCreator.CreateForAnalyze( + query, + Guid.NewGuid().ToString(), + schemaProvider, + loggerResolver); + + // Generated code available in buildItems + var generatedCode = buildItems.BuildedCode; + File.WriteAllText("debug_generated.cs", generatedCode); + + // Compile and inspect assembly + var compiledQuery = InstanceCreator.CompileForExecution(query, ...); + var assembly = compiledQuery.GetType().Assembly; +} +``` + +### 3. Query Execution Debugging + +```csharp +// Debug query execution with logging +public void DebugQueryExecution() +{ + var loggerResolver = new TestsLoggerResolver(); + var logger = loggerResolver.GetLogger("Debug"); + + var compiledQuery = InstanceCreator.CompileForExecution( + query, + Guid.NewGuid().ToString(), + schemaProvider, + loggerResolver); + + // Enable execution tracing + using var cts = new CancellationTokenSource(); + var results = compiledQuery.Run(cts.Token); + + // Inspect results + foreach (var row in results) + { + logger.LogInformation($"Row: {string.Join(", ", row)}"); + } +} +``` + +### 4. Schema Resolution Debugging + +```csharp +// Debug schema and method resolution +public void DebugSchemaResolution() +{ + var schema = new MyCustomSchema(); + var context = new RuntimeContext(); + + // Test table resolution + var table = schema.GetTableByName("test", context, "param1", "param2"); + + // Test method resolution + var methodFound = schema.TryResolveMethod( + "CustomMethod", + new[] { typeof(string), typeof(int) }, + typeof(MyEntity), + out var methodInfo); + + if (methodFound) + { + Console.WriteLine($"Resolved method: {methodInfo.Name}"); + } +} +``` + +## Testing Strategies + +### Unit Testing Patterns + +#### Parser Tests +```csharp +[TestClass] +public class CustomParserTests +{ + [TestMethod] + public void Should_Parse_New_Syntax() + { + // Arrange + var query = "SELECT * FROM #new.syntax('param') WITH OPTIONS"; + var lexer = new Lexer(query); + var parser = new Parser(lexer); + + // Act + var rootNode = parser.ComposeAll(); + + // Assert + Assert.IsInstanceOfType(rootNode, typeof(RootNode)); + // Additional assertions for AST structure + } +} +``` + +#### Schema Tests +```csharp +[TestClass] +public class CustomSchemaTests +{ + [TestMethod] + public void Should_Resolve_Custom_Method() + { + // Arrange + var schema = new CustomSchema(); + + // Act + var resolved = schema.TryResolveMethod( + "ProcessText", + new[] { typeof(string) }, + typeof(TextEntity), + out var methodInfo); + + // Assert + Assert.IsTrue(resolved); + Assert.AreEqual("ProcessText", methodInfo.Name); + } +} +``` + +#### Integration Tests +```csharp +[TestClass] +public class FullQueryIntegrationTests : BasicEntityTestBase +{ + [TestMethod] + public void Should_Execute_Complex_Query() + { + // Arrange + var data = CreateTestData(); + var query = @" + WITH processed AS ( + SELECT Name, ProcessText(Description) as CleanText + FROM #test.data() + ) + SELECT Name, Count(*) as WordCount + FROM processed + WHERE Length(CleanText) > 10 + GROUP BY Name + ORDER BY WordCount DESC"; + + // Act + var vm = CreateAndRunVirtualMachine(query, data); + var results = vm.Run(); + + // Assert + Assert.IsTrue(results.Count > 0); + // Verify result structure and content + } +} +``` + +### Performance Testing + +#### Benchmark Setup +```csharp +[MemoryDiagnoser] +[SimpleJob(RuntimeMoniker.Net80)] +public class QueryPerformanceBenchmark +{ + private ISchemaProvider _schemaProvider; + private string _complexQuery; + + [GlobalSetup] + public void Setup() + { + _schemaProvider = CreateSchemaProvider(); + _complexQuery = "SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ..."; + } + + [Benchmark] + public Table ExecuteComplexQuery() + { + var compiledQuery = InstanceCreator.CompileForExecution( + _complexQuery, + Guid.NewGuid().ToString(), + _schemaProvider, + new TestsLoggerResolver()); + + return compiledQuery.Run(); + } +} +``` + +#### Memory Analysis +```csharp +[TestMethod] +public void Should_Not_Leak_Memory() +{ + var initialMemory = GC.GetTotalMemory(true); + + for (int i = 0; i < 1000; i++) + { + var compiledQuery = InstanceCreator.CompileForExecution(query, ...); + var results = compiledQuery.Run(); + // Process results + } + + GC.Collect(); + GC.WaitForPendingFinalizers(); + GC.Collect(); + + var finalMemory = GC.GetTotalMemory(true); + var memoryIncrease = finalMemory - initialMemory; + + Assert.IsTrue(memoryIncrease < 100_000_000, "Memory increase should be minimal"); +} +``` + +## Build and CI/CD + +### Build Scripts + +#### Local Build Script +```bash +#!/bin/bash +# build.sh + +echo "Starting Musoq build process..." + +# Clean previous builds +dotnet clean + +# Restore dependencies +echo "Restoring dependencies..." +dotnet restore + +# Build solution +echo "Building solution..." +dotnet build --configuration Release --no-restore + +# Run tests +echo "Running tests..." +dotnet test --configuration Release --no-build --verbosity normal + +# Pack NuGet packages +echo "Packing NuGet packages..." +dotnet pack --configuration Release --no-build + +echo "Build completed successfully!" +``` + +#### Windows Build Script +```powershell +# build.ps1 + +Write-Host "Starting Musoq build process..." -ForegroundColor Green + +# Clean previous builds +dotnet clean + +# Restore dependencies +Write-Host "Restoring dependencies..." -ForegroundColor Yellow +dotnet restore + +# Build solution +Write-Host "Building solution..." -ForegroundColor Yellow +dotnet build --configuration Release --no-restore + +if ($LASTEXITCODE -ne 0) { + Write-Host "Build failed!" -ForegroundColor Red + exit 1 +} + +# Run tests +Write-Host "Running tests..." -ForegroundColor Yellow +dotnet test --configuration Release --no-build --verbosity normal + +if ($LASTEXITCODE -ne 0) { + Write-Host "Tests failed!" -ForegroundColor Red + exit 1 +} + +# Pack NuGet packages +Write-Host "Packing NuGet packages..." -ForegroundColor Yellow +dotnet pack --configuration Release --no-build + +Write-Host "Build completed successfully!" -ForegroundColor Green +``` + +### Continuous Integration + +#### GitHub Actions Workflow +```yaml +name: Build and Test + +on: + push: + branches: [ main, develop ] + pull_request: + branches: [ main ] + +jobs: + build: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v3 + + - name: Setup .NET + uses: actions/setup-dotnet@v3 + with: + dotnet-version: '8.0.x' + + - name: Restore dependencies + run: dotnet restore + + - name: Build + run: dotnet build --no-restore --configuration Release + + - name: Test + run: dotnet test --no-build --configuration Release --verbosity normal + + - name: Pack + run: dotnet pack --no-build --configuration Release + + - name: Upload artifacts + uses: actions/upload-artifact@v3 + with: + name: nuget-packages + path: '**/*.nupkg' +``` + +## Troubleshooting Common Issues + +### 1. Parse Errors + +**Symptom**: Unexpected token errors during parsing +``` +Error: Unexpected token 'IDENTIFIER' at position 15 +``` + +**Debug Steps**: +```csharp +// 1. Check lexer output +var lexer = new Lexer(problematicQuery); +var tokens = new List(); +while (lexer.Current().TokenType != TokenType.EndOfFile) +{ + tokens.Add(lexer.Current()); + lexer.Next(); +} + +// 2. Verify token sequence +foreach (var token in tokens) +{ + Console.WriteLine($"{token.TokenType}: '{token.Value}' at {token.Span}"); +} +``` + +**Common Causes**: +- Missing keywords in lexer +- Incorrect operator precedence +- Invalid character sequences + +### 2. Schema Resolution Failures + +**Symptom**: Schema or method not found errors +``` +Error: Schema 'custom' not found +Error: Method 'ProcessText' could not be resolved +``` + +**Debug Steps**: +```csharp +// 1. Verify schema registration +var schema = schemaProvider.GetSchema("custom"); +if (schema == null) +{ + Console.WriteLine("Schema not registered"); +} + +// 2. Check method signatures +var methodFound = schema.TryResolveMethod( + "ProcessText", + parameterTypes, + entityType, + out var methodInfo); + +if (!methodFound) +{ + Console.WriteLine("Method signature mismatch"); +} +``` + +**Common Causes**: +- Schema not registered in provider +- Method signature mismatch +- Missing BindableMethod attribute + +### 3. Compilation Errors + +**Symptom**: Generated code fails to compile +``` +Error: CS0103: The name 'unknownVariable' does not exist in the current context +``` + +**Debug Steps**: +```csharp +// 1. Inspect generated code +var buildItems = InstanceCreator.CreateForAnalyze(query, ...); +File.WriteAllText("debug.cs", buildItems.BuildedCode); + +// 2. Try manual compilation +var syntaxTree = CSharpSyntaxTree.ParseText(buildItems.BuildedCode); +var compilation = CSharpCompilation.Create("DebugAssembly") + .AddSyntaxTrees(syntaxTree) + .AddReferences(/* required references */); + +var diagnostics = compilation.GetDiagnostics(); +foreach (var diagnostic in diagnostics) +{ + Console.WriteLine(diagnostic); +} +``` + +**Common Causes**: +- Missing using statements +- Type inference errors +- Invalid variable names + +### 4. Runtime Errors + +**Symptom**: Exceptions during query execution +``` +Error: Object reference not set to an instance of an object +Error: Invalid cast exception +``` + +**Debug Steps**: +```csharp +// 1. Enable detailed logging +var loggerResolver = new DetailedLoggerResolver(); + +// 2. Wrap execution in try-catch +try +{ + var results = compiledQuery.Run(); +} +catch (Exception ex) +{ + Console.WriteLine($"Exception: {ex}"); + Console.WriteLine($"Stack trace: {ex.StackTrace}"); +} + +// 3. Check data source state +var rowSource = schema.GetRowSource("test", context); +foreach (var row in rowSource.GetRows(CancellationToken.None)) +{ + Console.WriteLine($"Row: {row}"); +} +``` + +**Common Causes**: +- Null reference exceptions in data sources +- Type conversion failures +- Resource disposal issues + +## Performance Optimization + +### Profiling Queries + +```csharp +public class QueryProfiler +{ + public static ProfileResult ProfileQuery(string query, ISchemaProvider provider) + { + var stopwatch = Stopwatch.StartNew(); + var initialMemory = GC.GetTotalMemory(false); + + // Compilation time + var compileStart = stopwatch.ElapsedMilliseconds; + var compiledQuery = InstanceCreator.CompileForExecution(query, Guid.NewGuid().ToString(), provider, new TestsLoggerResolver()); + var compileTime = stopwatch.ElapsedMilliseconds - compileStart; + + // Execution time + var executeStart = stopwatch.ElapsedMilliseconds; + var results = compiledQuery.Run(); + var executeTime = stopwatch.ElapsedMilliseconds - executeStart; + + stopwatch.Stop(); + var finalMemory = GC.GetTotalMemory(false); + + return new ProfileResult + { + TotalTime = stopwatch.ElapsedMilliseconds, + CompileTime = compileTime, + ExecuteTime = executeTime, + MemoryUsed = finalMemory - initialMemory, + RowCount = results.Count + }; + } +} +``` + +### Optimization Strategies + +1. **Query-Level Optimizations**: + - Use WHERE clauses to filter early + - Optimize JOIN order + - Avoid unnecessary SELECT columns + +2. **Data Source Optimizations**: + - Implement efficient data streaming + - Use appropriate data structures + - Cache expensive operations + +3. **Code Generation Optimizations**: + - Inline simple operations + - Use type-specific optimizations + - Minimize object allocations + +This guide provides comprehensive coverage of development and debugging techniques for working with the Musoq codebase effectively. \ No newline at end of file diff --git a/.copilot/plugin-development-guide.md b/.copilot/plugin-development-guide.md new file mode 100644 index 00000000..6b501dbc --- /dev/null +++ b/.copilot/plugin-development-guide.md @@ -0,0 +1,616 @@ +# Plugin Development Guide + +## Overview + +Musoq's plugin system is designed for extensibility, allowing developers to add new data sources and function libraries. This guide provides comprehensive information for developing plugins. + +## Data Source Plugin Development + +### Plugin Architecture + +``` +ISchema (Interface) → SchemaBase (Base Class) → YourSchema (Implementation) + ↓ +RowSource (Data Iterator) → YourRowSource (Implementation) + ↓ +Entity Classes → Your Data Models +``` + +### Step-by-Step Plugin Creation + +#### 1. Define Your Data Entity + +```csharp +public class FileEntity +{ + public string Name { get; set; } + public string FullPath { get; set; } + public long Size { get; set; } + public DateTime LastModified { get; set; } + public string Extension { get; set; } + public bool IsDirectory { get; set; } +} +``` + +#### 2. Create Row Source Implementation + +```csharp +public class FileSystemRowSource : RowSourceBase +{ + private readonly string _path; + private readonly bool _recursive; + + public FileSystemRowSource(string path, bool recursive, RuntimeContext context) + : base(context) + { + _path = path ?? throw new ArgumentNullException(nameof(path)); + _recursive = recursive; + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + var searchOption = _recursive ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly; + + foreach (var filePath in Directory.EnumerateFileSystemEntries(_path, "*", searchOption)) + { + cancellationToken.ThrowIfCancellationRequested(); + + var info = new FileInfo(filePath); + + yield return new FileEntity + { + Name = info.Name, + FullPath = info.FullName, + Size = info.Exists ? info.Length : 0, + LastModified = info.LastWriteTime, + Extension = info.Extension, + IsDirectory = Directory.Exists(filePath) + }; + } + } +} +``` + +#### 3. Create Schema Table + +```csharp +public class FileSystemTable : ISchemaTable +{ + public string Name => "files"; + + public ISchemaColumn[] Columns { get; } = + { + new SchemaColumn("Name", typeof(string)), + new SchemaColumn("FullPath", typeof(string)), + new SchemaColumn("Size", typeof(long)), + new SchemaColumn("LastModified", typeof(DateTime)), + new SchemaColumn("Extension", typeof(string)), + new SchemaColumn("IsDirectory", typeof(bool)) + }; +} +``` + +#### 4. Implement Schema Class + +```csharp +public class FileSystemSchema : SchemaBase +{ + public FileSystemSchema() : base("fs", new MethodsAggregator()) + { + // Register data sources + AddSource("files"); + + // Register tables + AddTable("files"); + } + + public override ISchemaTable GetTableByName(string name, RuntimeContext runtimeContext, params object[] parameters) + { + return name.ToLowerInvariant() switch + { + "files" => new FileSystemTable(), + _ => throw new NotSupportedException($"Table '{name}' is not supported.") + }; + } + + public override RowSource GetRowSource(string name, RuntimeContext runtimeContext, params object[] parameters) + { + return name.ToLowerInvariant() switch + { + "files" => new FileSystemRowSource( + parameters.Length > 0 ? parameters[0]?.ToString() : ".", + parameters.Length > 1 && Convert.ToBoolean(parameters[1]), + runtimeContext), + _ => throw new NotSupportedException($"Source '{name}' is not supported.") + }; + } +} +``` + +### Advanced Row Source Patterns + +#### Chunked Data Processing + +```csharp +public class ChunkedFileSystemRowSource : ChunkedSource +{ + private readonly string _path; + private readonly bool _recursive; + + public ChunkedFileSystemRowSource(string path, bool recursive, RuntimeContext context) + : base(context) + { + _path = path; + _recursive = recursive; + } + + protected override IEnumerable> GetChunks(CancellationToken cancellationToken) + { + const int chunkSize = 1000; + var chunk = new List(chunkSize); + + foreach (var entity in GetAllFiles()) + { + cancellationToken.ThrowIfCancellationRequested(); + + chunk.Add(entity); + + if (chunk.Count >= chunkSize) + { + yield return chunk; + chunk = new List(chunkSize); + } + } + + if (chunk.Count > 0) + yield return chunk; + } + + private IEnumerable GetAllFiles() + { + // Implementation here + } +} +``` + +#### Parameterized Data Sources + +```csharp +public class DatabaseRowSource : RowSourceBase +{ + private readonly string _connectionString; + private readonly string _query; + private readonly object[] _parameters; + + public DatabaseRowSource( + string connectionString, + string query, + RuntimeContext context, + params object[] parameters) + : base(context) + { + _connectionString = connectionString; + _query = query; + _parameters = parameters; + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + using var connection = new SqlConnection(_connectionString); + connection.Open(); + + using var command = new SqlCommand(_query, connection); + + // Add parameters + for (int i = 0; i < _parameters.Length; i++) + { + command.Parameters.AddWithValue($"@p{i}", _parameters[i]); + } + + using var reader = command.ExecuteReader(); + + while (reader.Read()) + { + cancellationToken.ThrowIfCancellationRequested(); + + yield return new DatabaseEntity + { + // Map reader columns to entity properties + }; + } + } +} +``` + +### Dynamic Schema Support + +#### Runtime Column Discovery + +```csharp +public class DynamicRowSource : RowSourceBase +{ + private readonly Func> _dataProvider; + private ISchemaColumn[] _columns; + + public DynamicRowSource(Func> dataProvider, RuntimeContext context) + : base(context) + { + _dataProvider = dataProvider; + } + + public override IEnumerable GetRows(CancellationToken cancellationToken) + { + var data = _dataProvider(); + + // Discover columns from first row + if (_columns == null) + { + var firstRow = data.FirstOrDefault(); + if (firstRow != null) + { + _columns = DiscoverColumns(firstRow); + } + } + + return data; + } + + private ISchemaColumn[] DiscoverColumns(dynamic row) + { + var columns = new List(); + + if (row is IDictionary dict) + { + foreach (var kvp in dict) + { + var type = kvp.Value?.GetType() ?? typeof(object); + columns.Add(new SchemaColumn(kvp.Key, type)); + } + } + + return columns.ToArray(); + } +} +``` + +## Function Library Development + +### Creating Custom Functions + +#### Basic Function Implementation + +```csharp +public class TextProcessingLibrary : LibraryBase +{ + [BindableMethod] + public string RemoveWhitespace(string input) + { + return string.IsNullOrEmpty(input) ? input : Regex.Replace(input, @"\s+", ""); + } + + [BindableMethod] + public string ExtractEmails(string text) + { + if (string.IsNullOrEmpty(text)) + return string.Empty; + + var emailPattern = @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"; + var matches = Regex.Matches(text, emailPattern); + + return string.Join(", ", matches.Cast().Select(m => m.Value)); + } + + [BindableMethod] + public int WordCount(string text) + { + if (string.IsNullOrEmpty(text)) + return 0; + + return text.Split(new[] { ' ', '\t', '\n', '\r' }, + StringSplitOptions.RemoveEmptyEntries).Length; + } +} +``` + +#### Generic Functions + +```csharp +public class GenericLibrary : LibraryBase +{ + [BindableMethod] + public T Coalesce(params T[] values) + { + return values.FirstOrDefault(v => v != null && !v.Equals(default(T))); + } + + [BindableMethod] + public bool IsNull(T value) + { + return value == null || value.Equals(default(T)); + } + + [BindableMethod] + public T IfNull(T value, T replacement) + { + return IsNull(value) ? replacement : value; + } +} +``` + +#### Aggregation Functions + +```csharp +public class StatisticsLibrary : LibraryBase +{ + [BindableMethod] + public decimal Median(IEnumerable values) + { + var sorted = values.Where(v => !IsNull(v)).OrderBy(v => v).ToArray(); + + if (sorted.Length == 0) + return 0; + + if (sorted.Length % 2 == 0) + { + return (sorted[sorted.Length / 2 - 1] + sorted[sorted.Length / 2]) / 2; + } + else + { + return sorted[sorted.Length / 2]; + } + } + + [BindableMethod] + public decimal StandardDeviation(IEnumerable values) + { + var array = values.Where(v => !IsNull(v)).ToArray(); + + if (array.Length <= 1) + return 0; + + var mean = array.Average(); + var variance = array.Sum(v => (v - mean) * (v - mean)) / (array.Length - 1); + + return (decimal)Math.Sqrt((double)variance); + } +} +``` + +### Complex Function Examples + +#### JSON Processing Functions + +```csharp +public class JsonLibrary : LibraryBase +{ + [BindableMethod] + public string JsonExtract(string json, string path) + { + try + { + var doc = JsonDocument.Parse(json); + var pathSegments = path.Split('.'); + + JsonElement current = doc.RootElement; + + foreach (var segment in pathSegments) + { + if (current.ValueKind == JsonValueKind.Object && current.TryGetProperty(segment, out var property)) + { + current = property; + } + else if (current.ValueKind == JsonValueKind.Array && int.TryParse(segment, out var index)) + { + if (index >= 0 && index < current.GetArrayLength()) + { + current = current[index]; + } + else + { + return null; + } + } + else + { + return null; + } + } + + return current.ToString(); + } + catch + { + return null; + } + } + + [BindableMethod] + public bool IsValidJson(string json) + { + try + { + JsonDocument.Parse(json); + return true; + } + catch + { + return false; + } + } +} +``` + +#### HTTP/Web Functions + +```csharp +public class WebLibrary : LibraryBase +{ + private static readonly HttpClient _httpClient = new(); + + [BindableMethod] + public string HttpGet(string url) + { + try + { + var response = _httpClient.GetStringAsync(url).Result; + return response; + } + catch + { + return null; + } + } + + [BindableMethod] + public string UrlEncode(string input) + { + return Uri.EscapeDataString(input ?? string.Empty); + } + + [BindableMethod] + public string UrlDecode(string input) + { + return Uri.UnescapeDataString(input ?? string.Empty); + } +} +``` + +## Plugin Registration and Deployment + +### Registration in Schema Provider + +```csharp +public class CustomSchemaProvider : ISchemaProvider +{ + private readonly Dictionary _schemas = new(); + + public CustomSchemaProvider() + { + // Register built-in schemas + RegisterSchema("fs", new FileSystemSchema()); + RegisterSchema("web", new WebSchema()); + RegisterSchema("json", new JsonSchema()); + } + + public void RegisterSchema(string name, ISchema schema) + { + _schemas[name.ToLowerInvariant()] = schema; + } + + public ISchema GetSchema(string name) + { + return _schemas.TryGetValue(name.ToLowerInvariant(), out var schema) ? schema : null; + } +} +``` + +### Function Library Registration + +```csharp +public class ExtendedMethodsAggregator : MethodsAggregator +{ + public ExtendedMethodsAggregator() + { + // Register custom libraries + RegisterLibrary(new TextProcessingLibrary()); + RegisterLibrary(new GenericLibrary()); + RegisterLibrary(new StatisticsLibrary()); + RegisterLibrary(new JsonLibrary()); + RegisterLibrary(new WebLibrary()); + } +} +``` + +## Testing Plugin Development + +### Unit Testing Data Sources + +```csharp +[TestClass] +public class FileSystemRowSourceTests +{ + [TestMethod] + public void Should_Return_Files_From_Directory() + { + // Arrange + var tempDir = Path.GetTempPath(); + var context = new RuntimeContext(); + var rowSource = new FileSystemRowSource(tempDir, false, context); + + // Act + var files = rowSource.GetRows(CancellationToken.None).ToList(); + + // Assert + Assert.IsTrue(files.Count > 0); + Assert.IsTrue(files.All(f => !string.IsNullOrEmpty(f.Name))); + } +} +``` + +### Integration Testing with Query Engine + +```csharp +[TestClass] +public class FileSystemSchemaIntegrationTests : BasicEntityTestBase +{ + [TestMethod] + public void Should_Execute_File_Query() + { + // Arrange + var schemaProvider = new CustomSchemaProvider(); + var query = "SELECT Name, Size FROM #fs.files('.', false) WHERE Extension = '.txt'"; + + // Act + var compiledQuery = CreateAndRunVirtualMachine(query, schemaProvider: schemaProvider); + var results = compiledQuery.Run(); + + // Assert + Assert.IsNotNull(results); + Assert.IsTrue(results.Columns.Any(c => c.ColumnName == "Name")); + Assert.IsTrue(results.Columns.Any(c => c.ColumnName == "Size")); + } +} +``` + +### Performance Testing + +```csharp +[TestMethod] +public void Should_Handle_Large_Dataset_Efficiently() +{ + var stopwatch = Stopwatch.StartNew(); + + var rowSource = new FileSystemRowSource("/large/directory", true, new RuntimeContext()); + var count = rowSource.GetRows(CancellationToken.None).Count(); + + stopwatch.Stop(); + + Assert.IsTrue(stopwatch.ElapsedMilliseconds < 5000, "Query should complete within 5 seconds"); + Assert.IsTrue(count > 0, "Should return some results"); +} +``` + +## Best Practices + +### Performance Optimization + +1. **Lazy Evaluation**: Always use `yield return` for data enumeration +2. **Cancellation Support**: Check cancellation tokens frequently +3. **Resource Management**: Implement proper disposal patterns +4. **Memory Efficiency**: Avoid loading entire datasets into memory + +### Error Handling + +1. **Graceful Degradation**: Handle errors without crashing +2. **Meaningful Exceptions**: Provide detailed error messages +3. **Logging Integration**: Use the provided logging infrastructure +4. **Validation**: Validate parameters early and thoroughly + +### Code Quality + +1. **Type Safety**: Use strongly-typed entities where possible +2. **Immutability**: Prefer immutable data structures +3. **Thread Safety**: Ensure thread-safe implementations +4. **Testing**: Write comprehensive unit and integration tests + +This guide provides the foundation for building robust, efficient plugins for the Musoq query engine. \ No newline at end of file