Command line interface

Installing the package provides the command epitopepredict in your path. This is a command line interface to the library without the need for any Python coding. It provides pre-defined functionality with settings specified in a text configuration file. Using this you can make MHC predictions with your chosen alleles and predictors. If you are using the IEDB prediction tools they should be installed locally and you can specify the path in the [iedbtools] section. Otherwise ignore those settings. Note that if settings are left out generally defaults will be used so you can have a minimal file as in the examples.

You can also choose to do additional analysis of the results. Since it may take some time to predict many sequences/many alleles the analysis can be run on existing predictions.

Usage

Usage largely involves setting up the config file and having your input files prepared. Running the command epitopepredict -c <yourfilename>.conf will create a new config file for you to work from if it doesn't exist. Just edit this with a text editor and then to execute:

epitopepredict -c <yourfilename>.conf -r

Configuration file settings

The advantage of configuration files is in avoiding long commands that have to be remembered or are prone to mistakes. Also the config files can be kept to recall what setting we used or to copy them for another set of files. The current options available in the file are shown below.

[base]
predictors = tepitope
mhc2_alleles = HLA-DRB1*01:01,HLA-DRB1*04:01
mhc1_alleles = HLA-A*01:01
mhc1_length = 11
mhc2_length = 15
n = 2
cutoff_method = default
cutoff = 4
sequence_file = 
path = results
overwrite = no
verbose = no
names = 
plots = no
genome_analysis = no

[iedbtools]
iedbmhc1_path = 
iedbmhc2_path = 
iedb_mhc1_method = IEDB_recommended
iedb_mhc2_method = IEDB_recommended

Settings explained:

name	example value	meaning
predictors	tepitope	name of predictor can be: tepitope, iedbmhc1, iedbmhc2, netmhciipan, mhcflurry
mhc1_alleles	HLA-A01:01,HLA-A03:01	list of MHC-I alleles or preset name
mhc2_alleles	HLA-DRB10101,HLA-DRB10103,HLA-DRB1*0401	list of MHC-II alleles or preset name
mhc1_length	11	length of n-mers for MHC-I prediction
mhc2_length	15	length of n-mers for MHC-II prediction
n	3	minimum number of alleles for promiscuous binders
cutoff_method	default	cutoff method default
cutoff	4	percentile cutoff for counting promiscuous binders, i.e. top 4 percent
sequence_file	zaire-ebolavirus.gb	set of protein sequences in genbank or fasta format
path	results	folder to save results to, can be empty for current folder
overwrite	no	overwrite the previous results
names	Rv0011c,Rv0019c	protein/sequence/locus tag names to predict in your file, optional
verbose	no	displays more information while running
plots	yes	make plots of protein binders
genome_analysis	no	global analysis for all proteins
iedbmhc1_path		folder where the IEDB MHC-I tools are installed, not required unless used
iedbmhc2_path		folder where the IEDB MHC-II tools are installed, not required unless used
iedb_mhc1_method	IEDB_recommended	predictor to use within the IEDB MHC-I tools (see below)
iedb_mhc2_method	IEDB_recommended	predictor to use within the IEDB MHC-II tools (see below)

Preset allele lists

For convenience there are some lists of common alleles that you can use without having to type allele names into the config file. These have been taken from various sources and are only a rough guide. Use epitopepredict -p to see the available presets.

The current selection is:

name	description
mhc1_supertypes	6 MHC-I supertypes
mhc2_supertypes	7 MHC-II supertypes
us_caucasion_mhc1	30 most common US caucasion MHC-I
us_african_mhc1	30 most common US african MHC-I
human_common_mhc2	11 most prevalent HLA-DR alleles worldwide
broad_coverage_mhc1	26 alleles providing broad coverage
bovine_like_mhc2	8 HLA-DR alleles chosen to approximate bovine response

IEDB tool methods

ann
comblib_sidney2008
consensus
IEDB_recommended
netmhcpan
smm
smmpmbec

Examples

MHC-II binding predictions for preset alleles of proteins in a genbank file

Using preset allele lists saves you the trouble of writing the alleles out. You can get the built-in presets by using -p at the command line. If you provide MHC-I alleles for a class II predictor like tepitope the program will give an error.

[base]
predictors = tepitope
presetalleles = common_human_mhc2
n = 2
cutoff = 5
sequence_file = zaire-ebolavirus.gb
path = results
names = 
plots = yes
genome_analysis = no

Outputs

In each results folder you will find csv files with the predictions for each sequence. This is the primary raw output. There is a separate folder for each prediction method. These folders can be re-used as input in the analysis section without re-running predictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command line interface

Table of Contents

Usage

Configuration file settings

Preset allele lists

IEDB tool methods

Examples

MHC-II binding predictions for preset alleles of proteins in a genbank file

Outputs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally