A web-based, interactive pangenome visualization tool
A pangenome represents genetic variation in a population as a variation graph, which greatly reduces the reference
bias that comes from using a linear reference genome. However, a linear reference genome is more intuitive to
understand and has been the traditional way that bioinformaticians use. As an effort to make it easier to visualize
and interpret variation graphs, I present pgv, an interactive visualization tool built on top of previous work
that aims to display structural variants in a variation graph interactively.
Instead of fitting the nodes, edges, and paths of a variation graph in a 2-dimensional space, pgv draws the sequence
graph itself on the x-y plane, and paths are rendered as separate layers that can be interactively selected/highlighted.
- The paths in a variation grpah can be interactively selected/highlighted:
A live demo is available at: https://w-gao.github.io/pgv.
If you want to try pgv yourself, you can get started in the following ways. Keep in mind that this project is under
development and may not work well with your own data. If you encounter any issues, please let me know by
opening an issue.
The easiest way to run pgv is through Docker:
$ docker pull wlgao/pgv:latestTo run a container:
$ docker run -d --name pgv \
    -v "$(pwd)/examples":/pgv/ui/examples \
    -p 8000:8000 \
    wlgao/pgv:latestThis creates a container in detach (-d) mode, exposes port 8000, with a volume for the graph files.
You can add additional volumes if you want to construct your own graphs inside the container, or, if you have
vg installed on your local system, you can use the pgv CLI to
pre-process graphs.
If successful, pgv should be running at:
http://localhost:8000
To stop the container, run:
docker stop pgv
Alternatively, you can pull the source code and build the project yourself. You would need the following minimum requirements:
- Node.js >= 16
- Yarn < 2
- Python >= 3.8
To clone the repo:
get clone https://github.com/w-gao/pgv.git
cd pgvThen, build the project:
# Run the prebuild script.
# Note: this script requires curl.
./prebuild.sh
# Build core package.
yarn core:build
# Build web package.
yarn web:build
# Start a preview HTTP server.
yarn web:previewIf successful, pgv should be running at:
http://localhost:8000
When you select a graph, you are presented with this interface:
This is split into three sections:
- The header for graph selection and navigation
- The sequenceTubeMap render of the graph
- The pgvrender of the graph
- To navigate the graph, you can use the ←and→buttons in the header, or theAandDkeys.
- To cycle between the paths, you can use the ↑and↓buttons, or theupanddownarrow keys.
- You can also move closer or away from the graph using the WandSkeys, up and down using theRandFkeys.
- (controls such as movement speed are limited at the moment, but can be easily added if needed)
If you want to use your own data, you need to pre-process the files first. This can be done by the pgv CLI.
Currently supported file formats are: FASTA (.fa), VCF (.vcf, .vcf.gz), and GBWT (.gbwt).
The following assumes that you have vg installed. You also need cli.py,
which is pre-installed inside the pgv container, or can be downloaded via:
curl -O https://raw.githubusercontent.com/w-gao/pgv/main/cli.pyFor example, to construct a graph from a FASTA file (x.fa) and a VCF file (x.vcf.gz), run:
$ python3 cli.py add example \
        --reference x.fa \
        --vcf x.vcf.gzThis will pre-process the input files and output the results to the folder set by --dest, which is ./examples/ by
default. When running locally, the dev server symlinks to this default folder to grab the graph files. When running via
Docker, the /pgv/ui/examples path inside the container is statically served. If you decide to modify this path, make
sure to update the configuration for the respective places as well.
Graph genomes are often large, so you can specify a range to only display a small chunk of the graph. Additionally, you
can also specify a GBWT file to include additional haplotypes for the particular range. For example, to limit the above
graph to nodes from 1:100, with more haplotypes from x.vg.gbwt:
$ python3 cli.py add example \
        --reference x.fa \
        --vcf x.vcf.gz
        --node-range 1:100 \
        --gbwt-name x.vg.gbwt
If you have an existing graph but want to update the range, you can use the update subcommand so it doesn't have to
re-construct and re-index the graph file:
$ python3 cli.py update example \
        --node-range 1:100 \
        --gbwt-name x.vg.gbwt
For more information on the CLI, run:
$ python3 cli.py --help
Copyright (c) 2023 William Gao. MIT license.



