-
Notifications
You must be signed in to change notification settings - Fork 0
[MaveQuest] Upload Curated Data
Jochen Weile edited this page Jun 5, 2023
·
1 revision
Please refer to MaveQuest: update source data for updating source data. Make sure the outputs are accessible to scripts here.
Working directory:
mavequest-importer
Make sure you have the Google Cloud credential
maveQuest-datastore-operator.json(linked here) in the parent directory (not within the working directory).
maveQuest-datastore-operator.json
- Create a folder under
inputwith date of creation as the folder name following the YYYYMMDD format - Copy over curated data files to the created folder. As of July 2022, the required data files are:

- Open
utils.js- If updating gene names and infos, increment version numbers in the
dbKindsconstant. For example, as of July 2022, we are atV6. So, for the next update, you need to change it toV7.
- If updating gene names and infos, increment version numbers in the
// Before
const dbKinds = {
'Gene': 'Gene_V6',
'Assay': 'Assay_V6',
'Phenotype': 'Phenotype_V6',
'Interactome': 'Interactome_V6',
'ClinInterests': 'Clinical_Interests_V6',
'Stats': 'Stats_V6',
};
// After
const dbKinds = {
'Gene': 'Gene_V7',
'Assay': 'Assay_V7',
'Phenotype': 'Phenotype_V7',
'Interactome': 'Interactome_V7',
'ClinInterests': 'Clinical_Interests_V7',
'Stats': 'Stats_V7',
};
- If only updating certain database without updating gene names, no need to increment version number.
- If needs to change the path to the database credential, add the new path to the
datastorevariable
const datastore = new Datastore({
projectId: 'glass-ally-143617',
keyFilename: '../maveQuest-datastore-operator.json', # Change path to credential
});
- Script:
addGeneinfo.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The gene info file (file name geneinfo_*.csv) |
| all | FALSE | If provided, the script will update all properties for all genes. This argument needs to be set if update-property is not set. |
| update-property | FALSE | If provided, the script will only update [property] for all genes. |
- Run:
// Update all properties
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --all
// Update only the gene symbols (gene_symbol property)
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --update-property gene_symbol
- Script:
addAmbry.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The Ambry data file (file name ambry_*.csv) |
- Run:
node addAmbry.js --input input/20220113/ambry_20220110.csv
- Script:
addCancerCensus.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The Cancer Census data file (file name cancer_census_*.csv) |
- Run:
node addCancerCensus.js --input input/20220113/cancer_census_20220110.csv
- Script:
addClinvar.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The ClinVar dataset file (file name clinvar_*.csv) |
- Run:
node addClinvar.js --input input/20220113/clinvar_20220110.csv
- Script:
addGeneDx.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The GeneDx dataset file (file name genedx_*.csv) |
- Run:
node addGeneDx.js --input input/20220113/genedx_20220110.csv
- Script:
addGenomeCRISPR.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input-entry | TRUE | The GenomeCRISPR dataset file (file name genomecrispr_*.csv) |
| input-summary | TRUE | The GenomeCRISPR stats file (file name genomecrispr_hitsSum_*.csv) |
- Run:
node addGenomeCRISPR.js --input-entry input/20220113/genomecrispr_20220110.csv --input-summary input/20220113/genomecrispr_hitsSum_20220110.csv
- Script:
addGenomeRNAi.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The GenomeRNAi dataset file (file name genomernai_*.csv) |
- Run:
node addGenomeRNAi.js --input input/20220113/genomernai_20220110.tsv
- Script:
addHuRI.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The HuRI dataset file (file name huri_*.csv) |
- Run:
node addHuRI.js --input input/20220113/huri_20220110.csv
- Script:
addInterpro.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The InterPro dataset file (file name interpro_*.csv) |
- Run:
node addInterpro.js --input input/20220113/interpro_20220110.csv
- Script:
addInvitae.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The Invitae dataset file (file name invitae_*.csv) |
- Run:
node addInvitae.js --input input/20220113/invitae_20220110.csv
- Script:
addMavedb.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The MaveDB dataset file (file name mavedb_*.csv) |
- Run:
node addMavedb.js --input input/20220113/mavedb_20220110.csv
- Script:
addOGEE.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input-entry | TRUE | The OGEE dataset file (file name ogee_gene_*.csv) |
| input-summary | TRUE | The OGEE stats file (file name ogee_study_*.csv) |
- Run:
node addOGEE.js --input-entry input/20220113/ogee_gene_20220110.csv --input-summary input/20220113/ogee_study_20220110.csv
- Script:
addOMIM.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The OMIM dataset file (file name omim_*.csv) |
- Run:
node addOMIM.js --input input/20220113/omim_20220110.csv
- Script:
addOrphanet.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The Orphanet dataset file (file name orphanet_*.csv) |
- Run:
node addOrphanet.js --input input/20220113/orphanet_20220110.csv
- Script:
addOrthology.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The merged human homology datasets file (file name human_orthology_merged_*.csv) |
- Run:
node addOrthology.js --input input/20220113/human_orthology_merged_20220110.tsv
- Script:
addOverexpression.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The merged human homology datasets file (file name human_overexpression_*.csv) |
- Run:
node addOverexpression.js --input input/20220113/human_overexpression_20220110.csv
- Script:
addPharmGKB.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The PharmGKB dataset file (file name pharmgkb_*.csv) |
- Run:
node addPharmGKB.js --input input/20220113/pharmgkb_20220110.tsv
- Script:
addSecondaryStructure.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input | TRUE | The secondary structure dataset file (file name secondary_structure_*.csv) |
- Run:
node addSecondaryStructure.js --input input/20220113/secondary_strucuture_20220110.csv
- Script:
addPriority.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input-acmg | TRUE | The ACMG SF dataset file (file name acmg_list_*.csv) |
| input-dais | TRUE | The DAIS dataset file (file name dais_list_*.csv) |
- Run:
node addPriority.js --input-acmg input/20220727/acmg_list_20220721.csv --input-dais input/20220727/dais_list_20220721.csv
- Script:
addBioGrid.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input-entry | TRUE | The BioGRID ORCS gene data file (file name biogrid_orcs_by_gene_*.csv) |
| input-summary | TRUE | The BioGRID ORCS screen data file (file name biogrid_orcs_screen_info_*.csv) |
- Run:
node addBioGrid.js --input-entry input/20220727/biogrid_orcs_by_genes_20220721.csv --input-summary input/20220727/biogrid_orcs_screen_info_20220721.csv
- Script:
addStats.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| input-database-versions | TRUE | The database versions JSON file (file name databaseVersions.json). This file should be in the current working directory. |
- Run:
node addStats.js --input-database-versions databaseVersions.json
Working directory:
mavequest-api
- Change database version if the database version was incremented in Prepare curated data.
- Go to
controllers/utils.js - Update the
dbKindsconstant to the the same version number as theutils.jsinmavequest-importerfolder.
- Go to
- Increment version number
- Go to
package.json - Increment version number. For example, the current version is
2.14.2and the next version will be2.14.3
- Go to
- Push API code
- Push code changes into the
masterbranch on themavequest-apiGithub repo - Create a pull request to merge code from the
masterbranch into theprodbranch - Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.
- Push code changes into the
Working directory:
mavequest-importer
::I recommend you keep one previous version in case you need to roll back to it. For example, if you just updated database to V6, you should keep V5.::
- Script:
bulkDelete.js - Arguments:
| Argument | Required | Description |
|---|---|---|
| kind | TRUE | Specify the “kind” to delete from. Kinds are defined in utils.js. Check Prepare curated data for examples above. |
| delete-all | TRUE | Delete everything for a kind in the datastore |
- Run:
# Example: delete database V3 from Google Cloud Datastore
node bulkDelete.js --kind Gene_V3 --delete-all
node bulkDelete.js --kind Assay_V3 --delete-all
node bulkDelete.js --kind Interactome_V3 --delete-all
node bulkDelete.js --kind Phenotype_V3 --delete-all
node bulkDelete.js --kind Clinical_Interests_V3 --delete-all
node bulkDelete.js --kind Stats_V3 --delete-all
- The
sitemap.xmlis a file that helps search engines (e.g. Google) to better crawl and index the website. We need to generate a new site map file each time after the database is updated.- Script:
bulkDelete.js - Arguments: no arguments
- Run:
- Script:
node updateSitemap.js
- Copy the generated
sitemap.xmlfile to thepublicfolder in the front-end repository:mavequest-front-end/public - Push the updated
sitemap.xmlto Google cloud- Push code changes into the
v2branch on themavequest-front-endGithub repo - Create a pull request to merge code from the
v2branch into theprodbranch - Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.
- Push code changes into the