shahcompbio/orfology

Introduction

shahcompbio/orfology is a bioinformatics pipeline that ...

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

fasta,protein_table,sample,condition
U937_protein.fas,philosopher/protein.tsv,U937,AML
swissprot.fasta,,SwissProt,SwissProt

Each row represents either a proteogenomics sample for which the protein fasta has been produced by proteomegenerator2 or proteomegenerator3 and a protein table from philosopher OR a protein fasta file you would like to analyze.

Now, you can run the pipeline using:

nextflow run shahcompbio/orfology \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Classify proteins by transcriptomic origins

If you are using orfology to classify proteins by transcriptomic origins which you have detected peptides for with mass spec (with the categorize_proteins), we recommend running ORFology with the --unique_proteins flag, which will filter your fasta files using the Indistinguishable Proteins column from the philosopher protein.tsv tables to just include those proteins which are uniquely distinguishable from other proteins. This will ensure that the non-canonical proteins have a combination of peptides which is distinguishable from other proteins in your analysis. These output tables all start with the prefix unique. Output tables which not been filtered for uniquely distinguishable proteins have the prefix all or all+unique (the latter of which are tables of merged outputs from unique and all). After this proteins are categorized using the following conditional logic:

SwissProt if it an exact sequence match for a swissprot protein.
Alt ORF from canonical transcript if one of the transcripts which the ORF is predicted from has an Ensembl ID.
ORF from alt spice transcript if one of the transcripts is a non-canonical splice isoform.
ORF from neogene if it is a non-canonical transcript which did not match to a known gene.
Uncategorized if it doesn't fit into one of the above categories.

You can run this workflow withthe following command:

nextflow run shahcompbio/orfology \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --categorize_proteins \
   --unique_proteins \
   --outdir <OUTDIR>

Key outputs here are:

classifyproteins/unique_proteins_merged_annotated_info_table.tsv: proteins are stratified by category and contains information about which samples they appeared in.
blastsummary_pgtools_merged/unique_proteins_merged.tsv: Contains results from the diamond blastp search, merged on the results of the table described in 1.

Credits

shahcompbio/orfology was originally written by Asher Preska Steinberg.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
bin		bin
conf		conf
docker/plotly		docker/plotly
docs		docs
modules		modules
subworkflows		subworkflows
tests		tests
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
ro-crate-metadata.json		ro-crate-metadata.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

shahcompbio/orfology

Introduction

Usage

Classify proteins by transcriptomic origins

Credits

Contributions and Support

Citations

About

Uh oh!

Releases

Packages

Languages

License

shahcompbio/ORFology

Folders and files

Latest commit

History

Repository files navigation

shahcompbio/orfology

Introduction

Usage

Classify proteins by transcriptomic origins

Credits

Contributions and Support

Citations

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages