Skip to content

[UKB] Request and Access Data

Jochen Weile edited this page Jun 5, 2023 · 1 revision

Request new UKB traits

The UK Biobank team releases new traits regularly for researchers to access. To request new UKB traits, you need to:

  1. Check if we already have access to the traits. Open the Rothlab UKB Data Request spreadsheet and go to the second tab (Previously Requested Traits). You can search by trait ID (field ID) or name to determine if traits have been previously requested.
  2. Create a new data request bucket. Follow instructions in this manual: https://www.ukbiobank.ac.uk/media/llvi2ygt/ams-user-guide-7-data-releases-rap-access.pdf
  3. Add requested traits to the beginning of the "Previously Requested Traits" tab of the Rothlab UKB Data Request spreadsheet.

Access UKB Data

UKB Research Analysis Platform (RAP)

All UKB data can be accessed via the UKB Research Analysis Platform (RAP). It is a cloud-based platform that allows you to access and run analyses using UKB data. Read more about UKB RAP here: https://www.ukbiobank.ac.uk/enable-your-research/research-analysis-platform

To use UKB RAP, you need an account. Each account comes with a £40 free trial credit. You can spend the credit on running analyses, downloading results from the RAP and etc. Make sure to add yourself to the "Roth Lab" organization so that you can bill the lab payment method after using up your free trial credit.

You need to create a project in which UKB data will be dispensed. Take a look at Kevin Kuang's UKB project on RAP: https://ukbiobank.dnanexus.com/panx/projects/G8QbJzQJzG5Pv4xz70ZgxvK0/. If you cannot access this project, please ask Jochen Weile or the current maintainer to add you to the project.

Locally cached data

We have also cached some data from the UK Biobank. Due to UKB policy, we are not allowed to download raw sequences and unfiltered variant files from the RAP. Instead, we developed a pipeline to pre-process and filter variants on RAP and downloaded the processed variants to our GALEN cluster.

Take a look at this repository to learn more about the pre-processing pipeline: https://github.com/rothlab/ukb_rap_analysis

The cached data is stored on GALEN.

  1. Filtered variants from the 450K UKB exomes: /home/rothlab/kkuang/ukb/ukb_data_450k
  2. ~14K traits for 450K UKB participants: /home/rothlab/kkuang/ukb/ukb_burden_test_full/data/phenotypes_all_450k.csv
Clone this wiki locally