|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +CausalML is a Python package for uplift modeling and causal inference with machine learning algorithms. It provides methods to estimate Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. |
| 8 | + |
| 9 | +## Development Setup |
| 10 | + |
| 11 | +### Environment Setup |
| 12 | +- Python 3.9+ required (supports 3.9-3.12) |
| 13 | +- Uses `uv` as the package manager (preferred) or `pip` |
| 14 | +- Install development dependencies with `make setup_local` (sets up pre-commit hooks) |
| 15 | + |
| 16 | +### Build Commands |
| 17 | +- `make build_ext`: Build Cython extensions (required before running code/tests) |
| 18 | +- `make build`: Build wheel distribution |
| 19 | +- `make install`: Install package locally |
| 20 | +- `make clean`: Clean build artifacts |
| 21 | + |
| 22 | +### Testing |
| 23 | +- `make test`: Run full test suite with coverage |
| 24 | +- `pytest -vs --cov causalml/`: Direct pytest command |
| 25 | +- `pytest tests/test_specific.py`: Run specific test file |
| 26 | +- Optional test flags: |
| 27 | + - `pytest --runtf`: Include TensorFlow tests |
| 28 | + - `pytest --runtorch`: Include PyTorch tests |
| 29 | + |
| 30 | +### Code Quality |
| 31 | +- Uses `black` for code formatting |
| 32 | +- Run `black .` before submitting PRs |
| 33 | +- Pre-commit hooks available via `make setup_local` |
| 34 | +- Flake8 configuration in tox.ini with max line length 120 |
| 35 | + |
| 36 | +## Architecture |
| 37 | + |
| 38 | +### Core Module Structure |
| 39 | +``` |
| 40 | +causalml/ |
| 41 | +├── dataset/ # Synthetic data generation |
| 42 | +├── feature_selection/ # Feature selection utilities |
| 43 | +├── inference/ # Main inference algorithms |
| 44 | +│ ├── meta/ # Meta-learners (S, T, X, R, DR learners) |
| 45 | +│ ├── tree/ # Causal trees and uplift trees |
| 46 | +│ ├── tf/ # TensorFlow implementations (DragonNet) |
| 47 | +│ ├── torch/ # PyTorch implementations (CEVAE) |
| 48 | +│ └── iv/ # Instrumental variable methods |
| 49 | +├── metrics/ # Evaluation metrics |
| 50 | +├── optimize/ # Policy learning and optimization |
| 51 | +└── propensity.py # Propensity score modeling |
| 52 | +``` |
| 53 | + |
| 54 | +### Key Components |
| 55 | + |
| 56 | +#### Meta-Learners (`causalml/inference/meta/`) |
| 57 | +- **BaseLearner**: Abstract base class for all meta-learners |
| 58 | +- **S-Learner**: Single model approach |
| 59 | +- **T-Learner**: Two model approach |
| 60 | +- **X-Learner**: Cross-learner with propensity scores |
| 61 | +- **R-Learner**: Robinson's R-learner |
| 62 | +- **DR-Learner**: Doubly robust learner |
| 63 | + |
| 64 | +#### Tree-Based Methods (`causalml/inference/tree/`) |
| 65 | +- Causal trees and forests with Cython implementations |
| 66 | +- Uplift trees for classification problems |
| 67 | +- Custom splitting criteria for causal inference |
| 68 | + |
| 69 | +#### Propensity Score Models (`causalml/propensity.py`) |
| 70 | +- **PropensityModel**: Abstract base for propensity estimation |
| 71 | +- Built-in calibration support |
| 72 | +- Clipping bounds to avoid numerical issues |
| 73 | + |
| 74 | +### Cython Extensions |
| 75 | +The package includes Cython-compiled modules for performance: |
| 76 | +- Tree algorithms (`_tree`, `_criterion`, `_splitter`, `_utils`) |
| 77 | +- Causal tree components (`_builder`, causal trees) |
| 78 | +- Always run `make build_ext` after changes to .pyx files |
| 79 | + |
| 80 | +## Common Workflows |
| 81 | + |
| 82 | +### Adding New Meta-Learners |
| 83 | +1. Inherit from `BaseLearner` in `causalml/inference/meta/base.py` |
| 84 | +2. Implement `fit()` and `predict()` methods |
| 85 | +3. Add appropriate tests in `tests/test_meta_learners.py` |
| 86 | + |
| 87 | +### Working with Tree Methods |
| 88 | +1. Cython files are in `causalml/inference/tree/` |
| 89 | +2. Rebuild extensions with `make build_ext` after changes |
| 90 | +3. Test with synthetic data from `causalml.dataset` |
| 91 | + |
| 92 | +### Testing Different Backends |
| 93 | +- Core tests run without optional dependencies |
| 94 | +- TensorFlow tests: `pytest --runtf` |
| 95 | +- PyTorch tests: `pytest --runtorch` |
| 96 | +- Tests use fixtures from `tests/conftest.py` for data generation |
| 97 | + |
| 98 | +### Git Operations |
| 99 | +- **Pushing branches**: Use specific SSH key for authentication: |
| 100 | + ```bash |
| 101 | + GIT_SSH_COMMAND='ssh -i ~/.ssh/github_personal -o IdentitiesOnly=yes' git push -u origin branch_name |
| 102 | + ``` |
| 103 | + |
| 104 | +## Important Notes |
| 105 | + |
| 106 | +- The package uses both pandas DataFrames and numpy arrays internally |
| 107 | +- Propensity scores are clipped by default to avoid division by zero |
| 108 | +- Meta-learners support both single and multiple treatment scenarios |
| 109 | +- Tree methods include built-in visualization capabilities |
| 110 | +- Optional dependencies (TensorFlow, PyTorch) are marked clearly in tests |
0 commit comments