[MaveQuest] Upload Curated Data

Update source data

Please refer to MaveQuest: update source data for updating source data. Make sure the outputs are accessible to scripts here.

Upload curated data

Working directory: mavequest-importer

Make sure you have the Google Cloud credential maveQuest-datastore-operator.json (linked here) in the parent directory (not within the working directory).

maveQuest-datastore-operator.json

Prepare curated data

Create a folder under input with date of creation as the folder name following the YYYYMMDD format
Copy over curated data files to the created folder. As of July 2022, the required data files are:

Open utils.js
1. If updating gene names and infos, increment version numbers in the dbKinds constant. For example, as of July 2022, we are at V6. So, for the next update, you need to change it to V7.

// Before
const dbKinds = {
  'Gene': 'Gene_V6',
  'Assay': 'Assay_V6',
  'Phenotype': 'Phenotype_V6',
  'Interactome': 'Interactome_V6',
  'ClinInterests': 'Clinical_Interests_V6',
  'Stats': 'Stats_V6',
};

// After
const dbKinds = {
  'Gene': 'Gene_V7',
  'Assay': 'Assay_V7',
  'Phenotype': 'Phenotype_V7',
  'Interactome': 'Interactome_V7',
  'ClinInterests': 'Clinical_Interests_V7',
  'Stats': 'Stats_V7',
};

If only updating certain database without updating gene names, no need to increment version number.
If needs to change the path to the database credential, add the new path to the datastore variable

const datastore = new Datastore({
  projectId: 'glass-ally-143617',
  keyFilename: '../maveQuest-datastore-operator.json', # Change path to credential
});

Upload Gene Information

Script: addGeneinfo.js
Arguments:

Argument	Required	Description
input	TRUE	The gene info file (file name geneinfo_*.csv)
all	FALSE	If provided, the script will update all properties for all genes. This argument needs to be set if update-property is not set.
update-property	FALSE	If provided, the script will only update [property] for all genes.

Run:

// Update all properties
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --all

// Update only the gene symbols (gene_symbol property)
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --update-property gene_symbol

Upload Ambry dataset

Script: addAmbry.js
Arguments:

Argument	Required	Description
input	TRUE	The Ambry data file (file name ambry_*.csv)

Run:

node addAmbry.js --input input/20220113/ambry_20220110.csv

Upload Cancer Census dataset

Script: addCancerCensus.js
Arguments:

Argument	Required	Description
input	TRUE	The Cancer Census data file (file name cancer_census_*.csv)

Run:

node addCancerCensus.js --input input/20220113/cancer_census_20220110.csv

Upload ClinVar dataset

Script: addClinvar.js
Arguments:

Argument	Required	Description
input	TRUE	The ClinVar dataset file (file name clinvar_*.csv)

Run:

node addClinvar.js --input input/20220113/clinvar_20220110.csv

Upload GeneDx dataset

Script: addGeneDx.js
Arguments:

Argument	Required	Description
input	TRUE	The GeneDx dataset file (file name genedx_*.csv)

Run:

node addGeneDx.js --input input/20220113/genedx_20220110.csv

Upload GenomeCRISPR dataset

Script: addGenomeCRISPR.js
Arguments:

Argument	Required	Description
input-entry	TRUE	The GenomeCRISPR dataset file (file name genomecrispr_*.csv)
input-summary	TRUE	The GenomeCRISPR stats file (file name genomecrispr_hitsSum_*.csv)

Run:

node addGenomeCRISPR.js --input-entry input/20220113/genomecrispr_20220110.csv --input-summary input/20220113/genomecrispr_hitsSum_20220110.csv

Upload GenomeRNAi dataset

Script: addGenomeRNAi.js
Arguments:

Argument	Required	Description
input	TRUE	The GenomeRNAi dataset file (file name genomernai_*.csv)

Run:

node addGenomeRNAi.js --input input/20220113/genomernai_20220110.tsv

Upload HuRI dataset

Script: addHuRI.js
Arguments:

Argument	Required	Description
input	TRUE	The HuRI dataset file (file name huri_*.csv)

Run:

node addHuRI.js --input input/20220113/huri_20220110.csv

Upload InterPro dataset

Script: addInterpro.js
Arguments:

Argument	Required	Description
input	TRUE	The InterPro dataset file (file name interpro_*.csv)

Run:

node addInterpro.js --input input/20220113/interpro_20220110.csv

Upload Invitae dataset

Script: addInvitae.js
Arguments:

Argument	Required	Description
input	TRUE	The Invitae dataset file (file name invitae_*.csv)

Run:

node addInvitae.js --input input/20220113/invitae_20220110.csv

Upload MaveDB dataset

Script: addMavedb.js
Arguments:

Argument	Required	Description
input	TRUE	The MaveDB dataset file (file name mavedb_*.csv)

Run:

node addMavedb.js --input input/20220113/mavedb_20220110.csv

Upload OGEE dataset

Script: addOGEE.js
Arguments:

Argument	Required	Description
input-entry	TRUE	The OGEE dataset file (file name ogee_gene_*.csv)
input-summary	TRUE	The OGEE stats file (file name ogee_study_*.csv)

Run:

node addOGEE.js --input-entry input/20220113/ogee_gene_20220110.csv --input-summary input/20220113/ogee_study_20220110.csv

Upload OMIM dataset

Script: addOMIM.js
Arguments:

Argument	Required	Description
input	TRUE	The OMIM dataset file (file name omim_*.csv)

Run:

node addOMIM.js --input input/20220113/omim_20220110.csv

Upload Orphanet dataset

Script: addOrphanet.js
Arguments:

Argument	Required	Description
input	TRUE	The Orphanet dataset file (file name orphanet_*.csv)

Run:

node addOrphanet.js --input input/20220113/orphanet_20220110.csv

Upload human homology datasets

Script: addOrthology.js
Arguments:

Argument	Required	Description
input	TRUE	The merged human homology datasets file (file name human_orthology_merged_*.csv)

Run:

node addOrthology.js --input input/20220113/human_orthology_merged_20220110.tsv

Upload over-expression datasets

Script: addOverexpression.js
Arguments:

Argument	Required	Description
input	TRUE	The merged human homology datasets file (file name human_overexpression_*.csv)

Run:

node addOverexpression.js --input input/20220113/human_overexpression_20220110.csv

Upload PharmGKB dataset

Script: addPharmGKB.js
Arguments:

Argument	Required	Description
input	TRUE	The PharmGKB dataset file (file name pharmgkb_*.csv)

Run:

node addPharmGKB.js --input input/20220113/pharmgkb_20220110.tsv

Upload secondary structure dataset

Script: addSecondaryStructure.js
Arguments:

Argument	Required	Description
input	TRUE	The secondary structure dataset file (file name secondary_structure_*.csv)

Run:

node addSecondaryStructure.js --input input/20220113/secondary_strucuture_20220110.csv

Upload priority genes

Script: addPriority.js
Arguments:

Argument	Required	Description
input-acmg	TRUE	The ACMG SF dataset file (file name acmg_list_*.csv)
input-dais	TRUE	The DAIS dataset file (file name dais_list_*.csv)

Run:

node addPriority.js --input-acmg input/20220727/acmg_list_20220721.csv --input-dais input/20220727/dais_list_20220721.csv

Upload BioGRID ORCS dataset

Script: addBioGrid.js
Arguments:

Argument	Required	Description
input-entry	TRUE	The BioGRID ORCS gene data file (file name biogrid_orcs_by_gene_*.csv)
input-summary	TRUE	The BioGRID ORCS screen data file (file name biogrid_orcs_screen_info_*.csv)

Run:

node addBioGrid.js --input-entry input/20220727/biogrid_orcs_by_genes_20220721.csv --input-summary input/20220727/biogrid_orcs_screen_info_20220721.csv

Add database statistics

Script: addStats.js
Arguments:

Argument	Required	Description
input-database-versions	TRUE	The database versions JSON file (file name databaseVersions.json). This file should be in the current working directory.

Run:

node addStats.js --input-database-versions databaseVersions.json

Switch over database version in API

Working directory: mavequest-api

Change database version if the database version was incremented in Prepare curated data.
1. Go to controllers/utils.js
2. Update the dbKinds constant to the the same version number as the utils.js in mavequest-importer folder.
Increment version number
1. Go to package.json
2. Increment version number. For example, the current version is 2.14.2 and the next version will be 2.14.3
Push API code
1. Push code changes into the master branch on the mavequest-api Github repo
2. Create a pull request to merge code from the master branch into the prod branch
3. Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.

Remove old database version

Working directory: mavequest-importer

::I recommend you keep one previous version in case you need to roll back to it. For example, if you just updated database to V6, you should keep V5.::

Script: bulkDelete.js
Arguments:

Argument	Required	Description
kind	TRUE	Specify the “kind” to delete from. Kinds are defined in `utils.js`. Check Prepare curated data for examples above.
delete-all	TRUE	Delete everything for a kind in the datastore

Run:

# Example: delete database V3 from Google Cloud Datastore
node bulkDelete.js --kind Gene_V3 --delete-all
node bulkDelete.js --kind Assay_V3 --delete-all
node bulkDelete.js --kind Interactome_V3 --delete-all
node bulkDelete.js --kind Phenotype_V3 --delete-all
node bulkDelete.js --kind Clinical_Interests_V3 --delete-all
node bulkDelete.js --kind Stats_V3 --delete-all

Generate Site Map

The sitemap.xml is a file that helps search engines (e.g. Google) to better crawl and index the website. We need to generate a new site map file each time after the database is updated.
- Script: bulkDelete.js
- Arguments: no arguments
- Run:

node updateSitemap.js

Copy the generated sitemap.xml file to the public folder in the front-end repository: mavequest-front-end/public
Push the updated sitemap.xml to Google cloud
1. Push code changes into the v2 branch on the mavequest-front-end Github repo
2. Create a pull request to merge code from the v2 branch into the prod branch
3. Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.

[MaveQuest] Upload Curated Data

Update source data

Upload curated data

Prepare curated data

Upload Gene Information

Upload Ambry dataset

Upload Cancer Census dataset

Upload ClinVar dataset

Upload GeneDx dataset

Upload GenomeCRISPR dataset

Upload GenomeRNAi dataset

Upload HuRI dataset

Upload InterPro dataset

Upload Invitae dataset

Upload MaveDB dataset

Upload OGEE dataset

Upload OMIM dataset

Upload Orphanet dataset

Upload human homology datasets

Upload over-expression datasets

Upload PharmGKB dataset

Upload secondary structure dataset

Upload priority genes

Upload BioGRID ORCS dataset

Add database statistics

Switch over database version in API

Remove old database version

Generate Site Map

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally