Skip to content

Scripts for processing protein structures and integrating FoldX, Rosetta, and RaSP outputs for mutation impact analysis.

Notifications You must be signed in to change notification settings

Edenl95/plotting_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Stability Plot (plot_mavisp_ddg.py)

Overview

plot_mavisp_ddg.py is a command-line tool for visualizing protein mutation stability predictions from MAVISp-generated CSV files. It produces grouped bar plots of ΔΔG (kcal/mol) values for multiple computational methods (FoldX, Rosetta, RaSP), with automatic standard deviation error bars when available.

The script is designed to handle large mutation datasets by splitting them into manageable chunks and exporting both PDF and high-resolution PNG figures for each chunk.

Key Features:

  • Detects ΔΔG columns for FoldX, Rosetta, and RaSP
  • Detects corresponding standard deviation columns (st. dev, stdev, std, sd)
  • Excludes irrelevant columns (e.g. classification, count, rank)
  • Converts values to numeric and safely ignores non-numeric entries
  • Splits large datasets into configurable chunks
  • Generates publication-ready PDF and PNG plots

Requirements

Python 3.10+ Packages: numpy, pandas, matplotlib

Example module load (if using a module system):

module load python/3.10/modulefile

Input Requirements:

  • Input file must be a CSV

  • Must contain a column named Mutation (used as the x-axis)

  • ΔΔG columns must include method keywords:

    • foldx
    • rosetta
    • rasp
  • Preferred ΔΔG identifiers (optional but recommended):

    • stability, kcal, ddg, ΔΔG, delta, dG
  • Standard deviation columns are detected automatically if their names contain:

    • st. dev, stdev, std, or sd

Usage

Basic usage (CSV in current directory)

python3 plot_mavisp_ddg.py -c my_mutations.csv

This will:

  • Read my_mutations.csv
  • Create an output folder ./my_mutations/
  • Generate plots with 10 mutations per figure

CSV in a different directory:

python3 plot_mavisp_ddg.py -c /full/path/to/my_mutations.csv

Specify output directory

python3 plot_mavisp_ddg.py -c my_mutations.csv -o plots/

Output will be saved in ./plots/.

Change chunk size

python3 plot_mavisp_ddg.py -c my_mutations.csv -n 15

This plots 15 mutations per figure.

Full example

python3 plot_mavisp_ddg.py \
  -c ABI1-simple_mode.csv \
  -o ABI1_plots \
  -n 10

Command-Line Options

Flag Description
-h, --help Show help message and exit
-c CSV, --csv CSV Input CSV file (simple or ensemble mode)
-o OUT, --out OUT Output directory (default: CSV basename in current directory)
-n CHUNK_SIZE, --chunk-size CHUNK_SIZE Number of mutations per plot (default: 10)

Output

For each chunk of mutations, the script generates two files:

File Description
CSVNAME_01.pdf PDF plot for chunk 1
CSVNAME_01.png High-resolution PNG plot for chunk 1

Each plot includes:

  • ΔΔG values per mutation
  • One bar group per method
  • Error bars for standard deviation (if present)
  • Horizontal reference line at ΔΔG = 0
  • Clear legend and axis labels

Notes

  • Column detection is case-insensitive
  • If no standard deviation column is found for a method, error bars are omitted silently
  • Multiple ΔΔG columns per method are supported
  • Large datasets are automatically split to keep plots readable
  • The script does not modify the input CSV

Workflow

  1. Run MAVISp and generate a CSV
  2. Verify the CSV contains a Mutation column
  3. Run plot_mavisp_ddg.py
  4. Collect PDF/PNG plots from the output directory

Summary

This script provides a robust, automated, and reproducible way to visualize mutation stability predictions across multiple computational methods, with minimal assumptions about column naming and strong support for large datasets.

About

Scripts for processing protein structures and integrating FoldX, Rosetta, and RaSP outputs for mutation impact analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages