Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#

name: Benchmarks
permissions:
contents: read

on:
release:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#

name: Demo
permissions:
contents: read

on:
release:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ jobs:
uses: articulate/actions-markdownlint@v1
with:
config: .markdownlint.yml
files: '*.md'
files: 'README.md'


verify-php-binary:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#

name: Publish
permissions:
contents: read

on:
release:
Expand Down
135 changes: 135 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

CSV Blueprint is a CLI tool for validating CSV files based on customizable YAML schemas. It provides over 330+ validation rules that can be applied to individual cells or entire columns, with support for parallel processing and multiple output formats.

### Core Architecture

The project follows a modular architecture with clear separation of concerns:

- **CLI Layer**: `src/Commands/` - Command classes for different operations (ValidateCsv, CreateSchema, etc.)
- **Schema Engine**: `src/Schema.php` - Core schema definition and parsing
- **Validation Rules**: `src/Rules/` - Two types of validation rules:
- `Cell/` - Individual cell validation rules (~90 rules)
- `Aggregate/` - Column-wide aggregate validation rules (~44 rules)
- **CSV Processing**: `src/Csv/` - CSV file handling and column management
- **Workers**: `src/Workers/` - Parallel processing implementation
- **Validators**: `src/Validators/` - Validation orchestration and error reporting

### Key Components

- **Schema Definition**: YAML-based schemas define validation rules for CSV columns
- **Rule System**: Extensible rule system with AbstractRule base class
- **Error Reporting**: Multiple output formats (table, text, GitHub Actions, etc.)
- **Parallel Processing**: Multi-threaded validation for large files
- **CLI Interface**: Built with Symfony Console components

## Development Commands

### Build and Install
```bash
make build # Install dependencies in development mode
make build-prod # Install dependencies in production mode
make build-phar-file # Build standalone PHAR executable
```

### Testing
```bash
# Run PHPUnit tests
./vendor/bin/phpunit

# Run specific test
./vendor/bin/phpunit tests/SpecificTest.php

# Run with coverage
./vendor/bin/phpunit --coverage-html build/coverage_html
```

### Code Quality
```bash
# Static analysis with Psalm
./vendor/bin/psalm

# Code style check
./vendor/bin/php-cs-fixer fix --dry-run

# Code style fix
./vendor/bin/php-cs-fixer fix

# Phan static analysis
./vendor/bin/phan
```

### Demo and Validation
```bash
# Run demo validation
make demo

# Validate specific CSV with schema
./csv-blueprint validate:csv --csv=path/to/file.csv --schema=path/to/schema.yml

# Create schema from existing CSV
./csv-blueprint create-schema --csv=path/to/file.csv
```

### Benchmarking
```bash
make bench # Run full benchmark suite
make bench-docker # Run benchmarks in Docker
make bench-create-csv # Generate test CSV files
```

## Schema Structure

Schemas are YAML files that define validation rules:

```yaml
filename_pattern: /pattern\.csv$/
columns:
- name: "Column Name"
rules:
not_empty: true
length_min: 1
length_max: 100
aggregate_rules:
is_unique: true
count_min: 1
```

### Rule Categories

- **Cell Rules** (`src/Rules/Cell/`): Validate individual cell values (data types, formats, ranges)
- **Aggregate Rules** (`src/Rules/Aggregate/`): Validate column-wide properties (uniqueness, statistics, counts)

## Testing Strategy

- **Unit Tests**: Located in `tests/` directory
- **Integration Tests**: Test complete validation workflows
- **Benchmark Tests**: Performance testing in `tests/Benchmarks/`
- **Example Schemas**: Test schemas in `tests/schemas/`
- **Fixture Data**: Test CSV files in `tests/fixtures/`

## Docker Support

The project includes Docker support for containerized execution:
- Build: `make docker-build`
- Run: `make docker-demo`
- Interactive: `make docker-in`

## File Organization

- `src/` - Main source code
- `tests/` - Test suite and fixtures
- `schema-examples/` - Example schema files
- `build/` - Build artifacts and tools
- `docker/` - Docker-related files
- `.github/workflows/` - CI/CD pipelines

## PHP Requirements

- PHP 8.3+
- Extensions: mbstring
- Uses modern PHP features (strict types, readonly properties, etc.)
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ You can find launch examples in the [workflow demo](https://github.com/JBZoo/Csv
# Default value: 'no'
skip-schema: 'no'

# Extra options for the CSV Blueprint. Only for debbuging and profiling.
# Extra options for the CSV Blueprint. Only for debugging and profiling.
# Available options:
# Add flag `--parallel` if you want to validate CSV files in parallel.
# Add flag `--dump-schema` if you want to see the final schema after all includes and inheritance.
Expand Down Expand Up @@ -565,7 +565,7 @@ columns:
is_consonant: true # Validates if the input contains only consonants. Example: "bcd".
is_alnum: true # Validates whether the input is only alphanumeric. Example: "aBc123".
is_alpha: true # This is similar to `is_alnum`, but it does not allow numbers. Example: "aBc".
is_hex_rgb_color: true # Validates weather the input is a hex RGB color or not. Examples: "#FF0000", "#123", "ffffff", "fff".
is_hex_rgb_color: true # Validates whether the input is a hex RGB color or not. Examples: "#FF0000", "#123", "ffffff", "fff".

# Check if the value is a valid hash. Supported algorithms:
# - md5, md4, md2, sha1, sha224, sha256, sha384, sha512/224, sha512/256, sha512
Expand Down Expand Up @@ -1441,7 +1441,7 @@ application of the CLI commands, helping users make the most out of the tool's c
`./csv-blueprint validate-csv --help`

<details>
<summary>CLICK to see validate-csv help messege</summary>
<summary>CLICK to see validate-csv help message</summary>

<!-- auto-update:validate-csv-help -->
```txt
Expand Down Expand Up @@ -1517,7 +1517,7 @@ Options:
`./csv-blueprint validate-schema --help`

<details>
<summary>CLICK to see validate-schema help messege</summary>
<summary>CLICK to see validate-schema help message</summary>

<!-- auto-update:validate-schema-help -->
```txt
Expand Down Expand Up @@ -1578,7 +1578,7 @@ Options:
`./csv-blueprint dump-schema --help`

<details>
<summary>CLICK to see debug-schema help messege</summary>
<summary>CLICK to see debug-schema help message</summary>

<!-- auto-update:debug-schema-help -->
```txt
Expand Down Expand Up @@ -1623,7 +1623,7 @@ Options:
It's beta. Work in progress.

<details>
<summary>CLICK to see create-schema help messege</summary>
<summary>CLICK to see create-schema help message</summary>

<!-- auto-update:create-schema-help -->
```txt
Expand Down
4 changes: 2 additions & 2 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,10 @@ inputs:
default: no
required: true

# Only for debbuging and profiling
# Only for debugging and profiling
extra:
description: |
Extra options for the CSV Blueprint. Only for debbuging and profiling.
Extra options for the CSV Blueprint. Only for debugging and profiling.
Available options:
Add flag `--parallel` if you want to validate CSV files in parallel.
Add flag `--dump-schema` if you want to see the final schema after all includes and inheritance.
Expand Down
2 changes: 1 addition & 1 deletion schema-examples/full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ columns:
is_consonant: true # Validates if the input contains only consonants. Example: "bcd".
is_alnum: true # Validates whether the input is only alphanumeric. Example: "aBc123".
is_alpha: true # This is similar to `is_alnum`, but it does not allow numbers. Example: "aBc".
is_hex_rgb_color: true # Validates weather the input is a hex RGB color or not. Examples: "#FF0000", "#123", "ffffff", "fff".
is_hex_rgb_color: true # Validates whether the input is a hex RGB color or not. Examples: "#FF0000", "#123", "ffffff", "fff".

# Check if the value is a valid hash. Supported algorithms:
# - md5, md4, md2, sha1, sha224, sha256, sha384, sha512/224, sha512/256, sha512
Expand Down
2 changes: 1 addition & 1 deletion src/Rules/Cell/IsHexRgbColor.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public function getHelpMeta(): array
[
self::DEFAULT => [
'true',
'Validates weather the input is a hex RGB color or not. '
'Validates whether the input is a hex RGB color or not. '
. 'Examples: "#FF0000", "#123", "ffffff", "fff".',
],
],
Expand Down
2 changes: 1 addition & 1 deletion tests/Tools.php
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ public static function realExecution(string $action, array $params = [], string
]),
$params,
$rootDir,
true,
false,
);
}

Expand Down
Loading