Estimating Phylogenetic trees using 30 microorganisms (previously 6 organisms: review data folder and notebook. Looking at the 16S rRNA region with Unsupervised Learning, web based tools and Molecular Evolutionary Genetics Analysis MEGA7. Further we are looking at motifs and finding out what they do.
It is important to know these regions since they can potentially give use clues about the regions we can target for targeted DNA therapies.
Make the virtual environment. When working in your own system
python3 -m venv phylo-env   Activate the virtual environment.
source phylo-env/bin/activate   Install packages. You need an email to be in your .bashrc file to run biopython.
make install run_scriptIn your terminal, in the directory where you cloned this repository. Run this command to run notebooks.
jupyter notebook Phylogenetic_trees_unsupervised_learning.ipynbPreviously, we've not provided a codebook/data description file since one of the headings cover that in the notebook. Otherwise, you can check out the notebook or the HTML file i've provided in the repository.
Build DockerFile
sudo docker build -t phylo-exp .Run the Docker image
sudo docker run -it -p 8888:8888 --rm phylo-exp:latestInitialize dvc to the folder. To allow us to use dvc functionality to be used in the repo. NB. files will be created in the directory.
dvc initTrack changes to the different data files. The reason why we are doing this is because these files will change during the experiment especially if the investigator wants to try other experiments with more data.
dvc add updated_data/data.md
dvc add updated_data/sequences.fasta
dvc add updated_data/sequence_metrics.csvCommit changes to save the changes that have occured in the repository.
dvc commit