This pipeline is to generate genes and transposable elements (TEs) from fastq.gz paired-end files. Pipeline includes steps:
- Use fastp to remove low quality reads and trimmed adapters
- Use STAR to align reads to reference genome
- Use TEtranscripts to quantify the expression of genes and transposable elements
The output will be saved inside a newly created folder ./results after running the snakemake pipeline. The count table, which has the suffix .cntTable will be used for down-stream analysis using TEKRABber.
- Install conda, and create a conda environment that can run snakemake (v7.25.0).
- Clone this repository to your working directory
- In this clone-repo, create a
data/folder that you put all of your fastq.gz files in it - Use the
create_yaml.pyscript to create a config file for snakemake to run
python create_yaml.py -outdir <your_out_put_dir_path> -yaml <your_config_yaml_file_name>- If we want to execute snakemake in your command line, you can:
snakemake -s preTEKRABber.snake --configfile your-config.yaml -c 1If you prefer to run it in cluster, for example, using slurm, you can modified the runSnake.sh script and then:
sbatch runSnake.sh- The results will be saved inside a new create folder
./resultsin this repo.
A more detailed package version can be found in the yml files inside the envs/ repo.
In brief:
snakemake==7.25.0
fastp==0.12.4
star==2.7.10b
tetranscripts==2.2.3envsfolder has the conda env setting for packagesothersfolder has some modified substracted snakefiles from thepreTEKRABber.snakefor certain usage. It is recommend that you start from thepreTEKRABber.snake.