Skip to content

[MaveQuest] Upload Curated Data

Jochen Weile edited this page Jun 5, 2023 · 1 revision

Update source data

Please refer to MaveQuest: update source data for updating source data. Make sure the outputs are accessible to scripts here.

Upload curated data

Working directory: mavequest-importer

Make sure you have the Google Cloud credential maveQuest-datastore-operator.json (linked here) in the parent directory (not within the working directory).

maveQuest-datastore-operator.json

Prepare curated data

  1. Create a folder under input with date of creation as the folder name following the YYYYMMDD format
  2. Copy over curated data files to the created folder. As of July 2022, the required data files are:

Image

  1. Open utils.js
    1. If updating gene names and infos, increment version numbers in the dbKinds constant. For example, as of July 2022, we are at V6. So, for the next update, you need to change it to V7.
// Before
const dbKinds = {
  'Gene': 'Gene_V6',
  'Assay': 'Assay_V6',
  'Phenotype': 'Phenotype_V6',
  'Interactome': 'Interactome_V6',
  'ClinInterests': 'Clinical_Interests_V6',
  'Stats': 'Stats_V6',
};

// After
const dbKinds = {
  'Gene': 'Gene_V7',
  'Assay': 'Assay_V7',
  'Phenotype': 'Phenotype_V7',
  'Interactome': 'Interactome_V7',
  'ClinInterests': 'Clinical_Interests_V7',
  'Stats': 'Stats_V7',
};
  1. If only updating certain database without updating gene names, no need to increment version number.
  2. If needs to change the path to the database credential, add the new path to the datastore variable
const datastore = new Datastore({
  projectId: 'glass-ally-143617',
  keyFilename: '../maveQuest-datastore-operator.json', # Change path to credential
});

Upload Gene Information

  • Script: addGeneinfo.js
  • Arguments:
Argument Required Description
input TRUE The gene info file (file name geneinfo_*.csv)
all FALSE If provided, the script will update all properties for all genes. This argument needs to be set if update-property is not set.
update-property FALSE If provided, the script will only update [property] for all genes.
  • Run:
// Update all properties
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --all

// Update only the gene symbols (gene_symbol property)
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --update-property gene_symbol

Upload Ambry dataset

  • Script: addAmbry.js
  • Arguments:
Argument Required Description
input TRUE The Ambry data file (file name ambry_*.csv)
  • Run:
node addAmbry.js --input input/20220113/ambry_20220110.csv

Upload Cancer Census dataset

  • Script: addCancerCensus.js
  • Arguments:
Argument Required Description
input TRUE The Cancer Census data file (file name cancer_census_*.csv)
  • Run:
node addCancerCensus.js --input input/20220113/cancer_census_20220110.csv

Upload ClinVar dataset

  • Script: addClinvar.js
  • Arguments:
Argument Required Description
input TRUE The ClinVar dataset file (file name clinvar_*.csv)
  • Run:
node addClinvar.js --input input/20220113/clinvar_20220110.csv

Upload GeneDx dataset

  • Script: addGeneDx.js
  • Arguments:
Argument Required Description
input TRUE The GeneDx dataset file (file name genedx_*.csv)
  • Run:
node addGeneDx.js --input input/20220113/genedx_20220110.csv

Upload GenomeCRISPR dataset

  • Script: addGenomeCRISPR.js
  • Arguments:
Argument Required Description
input-entry TRUE The GenomeCRISPR dataset file (file name genomecrispr_*.csv)
input-summary TRUE The GenomeCRISPR stats file (file name genomecrispr_hitsSum_*.csv)
  • Run:
node addGenomeCRISPR.js --input-entry input/20220113/genomecrispr_20220110.csv --input-summary input/20220113/genomecrispr_hitsSum_20220110.csv

Upload GenomeRNAi dataset

  • Script: addGenomeRNAi.js
  • Arguments:
Argument Required Description
input TRUE The GenomeRNAi dataset file (file name genomernai_*.csv)
  • Run:
node addGenomeRNAi.js --input input/20220113/genomernai_20220110.tsv

Upload HuRI dataset

  • Script: addHuRI.js
  • Arguments:
Argument Required Description
input TRUE The HuRI dataset file (file name huri_*.csv)
  • Run:
node addHuRI.js --input input/20220113/huri_20220110.csv

Upload InterPro dataset

  • Script: addInterpro.js
  • Arguments:
Argument Required Description
input TRUE The InterPro dataset file (file name interpro_*.csv)
  • Run:
node addInterpro.js --input input/20220113/interpro_20220110.csv

Upload Invitae dataset

  • Script: addInvitae.js
  • Arguments:
Argument Required Description
input TRUE The Invitae dataset file (file name invitae_*.csv)
  • Run:
node addInvitae.js --input input/20220113/invitae_20220110.csv

Upload MaveDB dataset

  • Script: addMavedb.js
  • Arguments:
Argument Required Description
input TRUE The MaveDB dataset file (file name mavedb_*.csv)
  • Run:
node addMavedb.js --input input/20220113/mavedb_20220110.csv

Upload OGEE dataset

  • Script: addOGEE.js
  • Arguments:
Argument Required Description
input-entry TRUE The OGEE dataset file (file name ogee_gene_*.csv)
input-summary TRUE The OGEE stats file (file name ogee_study_*.csv)
  • Run:
node addOGEE.js --input-entry input/20220113/ogee_gene_20220110.csv --input-summary input/20220113/ogee_study_20220110.csv

Upload OMIM dataset

  • Script: addOMIM.js
  • Arguments:
Argument Required Description
input TRUE The OMIM dataset file (file name omim_*.csv)
  • Run:
node addOMIM.js --input input/20220113/omim_20220110.csv

Upload Orphanet dataset

  • Script: addOrphanet.js
  • Arguments:
Argument Required Description
input TRUE The Orphanet dataset file (file name orphanet_*.csv)
  • Run:
node addOrphanet.js --input input/20220113/orphanet_20220110.csv

Upload human homology datasets

  • Script: addOrthology.js
  • Arguments:
Argument Required Description
input TRUE The merged human homology datasets file (file name human_orthology_merged_*.csv)
  • Run:
node addOrthology.js --input input/20220113/human_orthology_merged_20220110.tsv

Upload over-expression datasets

  • Script: addOverexpression.js
  • Arguments:
Argument Required Description
input TRUE The merged human homology datasets file (file name human_overexpression_*.csv)
  • Run:
node addOverexpression.js --input input/20220113/human_overexpression_20220110.csv

Upload PharmGKB dataset

  • Script: addPharmGKB.js
  • Arguments:
Argument Required Description
input TRUE The PharmGKB dataset file (file name pharmgkb_*.csv)
  • Run:
node addPharmGKB.js --input input/20220113/pharmgkb_20220110.tsv

Upload secondary structure dataset

  • Script: addSecondaryStructure.js
  • Arguments:
Argument Required Description
input TRUE The secondary structure dataset file (file name secondary_structure_*.csv)
  • Run:
node addSecondaryStructure.js --input input/20220113/secondary_strucuture_20220110.csv

Upload priority genes

  • Script: addPriority.js
  • Arguments:
Argument Required Description
input-acmg TRUE The ACMG SF dataset file (file name acmg_list_*.csv)
input-dais TRUE The DAIS dataset file (file name dais_list_*.csv)
  • Run:
node addPriority.js --input-acmg input/20220727/acmg_list_20220721.csv --input-dais input/20220727/dais_list_20220721.csv

Upload BioGRID ORCS dataset

  • Script: addBioGrid.js
  • Arguments:
Argument Required Description
input-entry TRUE The BioGRID ORCS gene data file (file name biogrid_orcs_by_gene_*.csv)
input-summary TRUE The BioGRID ORCS screen data file (file name biogrid_orcs_screen_info_*.csv)
  • Run:
node addBioGrid.js --input-entry input/20220727/biogrid_orcs_by_genes_20220721.csv --input-summary input/20220727/biogrid_orcs_screen_info_20220721.csv

Add database statistics

  • Script: addStats.js
  • Arguments:
Argument Required Description
input-database-versions TRUE The database versions JSON file (file name databaseVersions.json). This file should be in the current working directory.
  • Run:
node addStats.js --input-database-versions databaseVersions.json

Switch over database version in API

Working directory: mavequest-api

  1. Change database version if the database version was incremented in Prepare curated data.
    1. Go to controllers/utils.js
    2. Update the dbKinds constant to the the same version number as the utils.js in mavequest-importer folder.
  2. Increment version number
    1. Go to package.json
    2. Increment version number. For example, the current version is 2.14.2 and the next version will be 2.14.3
  3. Push API code
    1. Push code changes into the master branch on the mavequest-api Github repo
    2. Create a pull request to merge code from the master branch into the prod branch
    3. Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.

Remove old database version

Working directory: mavequest-importer

::I recommend you keep one previous version in case you need to roll back to it. For example, if you just updated database to V6, you should keep V5.::

  • Script: bulkDelete.js
  • Arguments:
Argument Required Description
kind TRUE Specify the “kind” to delete from. Kinds are defined in utils.js. Check Prepare curated data for examples above.
delete-all TRUE Delete everything for a kind in the datastore
  • Run:
# Example: delete database V3 from Google Cloud Datastore
node bulkDelete.js --kind Gene_V3 --delete-all
node bulkDelete.js --kind Assay_V3 --delete-all
node bulkDelete.js --kind Interactome_V3 --delete-all
node bulkDelete.js --kind Phenotype_V3 --delete-all
node bulkDelete.js --kind Clinical_Interests_V3 --delete-all
node bulkDelete.js --kind Stats_V3 --delete-all

Generate Site Map

  1. The sitemap.xml is a file that helps search engines (e.g. Google) to better crawl and index the website. We need to generate a new site map file each time after the database is updated.
    • Script: bulkDelete.js
    • Arguments: no arguments
    • Run:
node updateSitemap.js
  1. Copy the generated sitemap.xml file to the public folder in the front-end repository: mavequest-front-end/public
  2. Push the updated sitemap.xml to Google cloud
    1. Push code changes into the v2 branch on the mavequest-front-end Github repo
    2. Create a pull request to merge code from the v2 branch into the prod branch
    3. Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.
Clone this wiki locally