plot_mavisp_ddg.py is a command-line tool for visualizing protein mutation stability predictions from MAVISp-generated CSV files. It produces grouped bar plots of ΔΔG (kcal/mol) values for multiple computational methods (FoldX, Rosetta, RaSP), with automatic standard deviation error bars when available.
The script is designed to handle large mutation datasets by splitting them into manageable chunks and exporting both PDF and high-resolution PNG figures for each chunk.
Key Features:
- Detects ΔΔG columns for FoldX, Rosetta, and RaSP
- Detects corresponding standard deviation columns (st. dev, stdev, std, sd)
- Excludes irrelevant columns (e.g. classification, count, rank)
- Converts values to numeric and safely ignores non-numeric entries
- Splits large datasets into configurable chunks
- Generates publication-ready PDF and PNG plots
Python 3.10+ Packages: numpy, pandas, matplotlib
Example module load (if using a module system):
module load python/3.10/modulefileInput Requirements:
-
Input file must be a CSV
-
Must contain a column named Mutation (used as the x-axis)
-
ΔΔG columns must include method keywords:
- foldx
- rosetta
- rasp
-
Preferred ΔΔG identifiers (optional but recommended):
- stability, kcal, ddg, ΔΔG, delta, dG
-
Standard deviation columns are detected automatically if their names contain:
- st. dev, stdev, std, or sd
Basic usage (CSV in current directory)
python3 plot_mavisp_ddg.py -c my_mutations.csvThis will:
- Read my_mutations.csv
- Create an output folder ./my_mutations/
- Generate plots with 10 mutations per figure
CSV in a different directory:
python3 plot_mavisp_ddg.py -c /full/path/to/my_mutations.csvSpecify output directory
python3 plot_mavisp_ddg.py -c my_mutations.csv -o plots/Output will be saved in ./plots/.
Change chunk size
python3 plot_mavisp_ddg.py -c my_mutations.csv -n 15This plots 15 mutations per figure.
Full example
python3 plot_mavisp_ddg.py \
-c ABI1-simple_mode.csv \
-o ABI1_plots \
-n 10| Flag | Description |
|---|---|
-h, --help |
Show help message and exit |
-c CSV, --csv CSV |
Input CSV file (simple or ensemble mode) |
-o OUT, --out OUT |
Output directory (default: CSV basename in current directory) |
-n CHUNK_SIZE, --chunk-size CHUNK_SIZE |
Number of mutations per plot (default: 10) |
For each chunk of mutations, the script generates two files:
| File | Description |
|---|---|
CSVNAME_01.pdf |
PDF plot for chunk 1 |
CSVNAME_01.png |
High-resolution PNG plot for chunk 1 |
Each plot includes:
- ΔΔG values per mutation
- One bar group per method
- Error bars for standard deviation (if present)
- Horizontal reference line at ΔΔG = 0
- Clear legend and axis labels
- Column detection is case-insensitive
- If no standard deviation column is found for a method, error bars are omitted silently
- Multiple ΔΔG columns per method are supported
- Large datasets are automatically split to keep plots readable
- The script does not modify the input CSV
Workflow
- Run MAVISp and generate a CSV
- Verify the CSV contains a Mutation column
- Run plot_mavisp_ddg.py
- Collect PDF/PNG plots from the output directory
This script provides a robust, automated, and reproducible way to visualize mutation stability predictions across multiple computational methods, with minimal assumptions about column naming and strong support for large datasets.