-
Notifications
You must be signed in to change notification settings - Fork 0
[MaveQuest] Upload Curated Data
Jochen Weile edited this page Jun 5, 2023
·
1 revision
Please refer to MaveQuest: update source data for updating source data. Make sure the outputs are accessible to scripts here.
Working directory:
mavequest-importer
Make sure you have the Google Cloud credential
maveQuest-datastore-operator.json
(linked here) in the parent directory (not within the working directory).
maveQuest-datastore-operator.json
- Create a folder under
input
with date of creation as the folder name following the YYYYMMDD format - Copy over curated data files to the created folder. As of July 2022, the required data files are:
- Open
utils.js
- If updating gene names and infos, increment version numbers in the
dbKinds
constant. For example, as of July 2022, we are atV6
. So, for the next update, you need to change it toV7
.
- If updating gene names and infos, increment version numbers in the
// Before
const dbKinds = {
'Gene': 'Gene_V6',
'Assay': 'Assay_V6',
'Phenotype': 'Phenotype_V6',
'Interactome': 'Interactome_V6',
'ClinInterests': 'Clinical_Interests_V6',
'Stats': 'Stats_V6',
};
// After
const dbKinds = {
'Gene': 'Gene_V7',
'Assay': 'Assay_V7',
'Phenotype': 'Phenotype_V7',
'Interactome': 'Interactome_V7',
'ClinInterests': 'Clinical_Interests_V7',
'Stats': 'Stats_V7',
};
- If only updating certain database without updating gene names, no need to increment version number.
- If needs to change the path to the database credential, add the new path to the
datastore
variable
const datastore = new Datastore({
projectId: 'glass-ally-143617',
keyFilename: '../maveQuest-datastore-operator.json', # Change path to credential
});
- Script:
addGeneinfo.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The gene info file (file name geneinfo_*.csv) |
all | FALSE | If provided, the script will update all properties for all genes. This argument needs to be set if update-property is not set. |
update-property | FALSE | If provided, the script will only update [property] for all genes. |
- Run:
// Update all properties
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --all
// Update only the gene symbols (gene_symbol property)
node addGeneInfo.js --input input/20220113/geneinfo_20220110.csv --update-property gene_symbol
- Script:
addAmbry.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The Ambry data file (file name ambry_*.csv) |
- Run:
node addAmbry.js --input input/20220113/ambry_20220110.csv
- Script:
addCancerCensus.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The Cancer Census data file (file name cancer_census_*.csv) |
- Run:
node addCancerCensus.js --input input/20220113/cancer_census_20220110.csv
- Script:
addClinvar.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The ClinVar dataset file (file name clinvar_*.csv) |
- Run:
node addClinvar.js --input input/20220113/clinvar_20220110.csv
- Script:
addGeneDx.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The GeneDx dataset file (file name genedx_*.csv) |
- Run:
node addGeneDx.js --input input/20220113/genedx_20220110.csv
- Script:
addGenomeCRISPR.js
- Arguments:
Argument | Required | Description |
---|---|---|
input-entry | TRUE | The GenomeCRISPR dataset file (file name genomecrispr_*.csv) |
input-summary | TRUE | The GenomeCRISPR stats file (file name genomecrispr_hitsSum_*.csv) |
- Run:
node addGenomeCRISPR.js --input-entry input/20220113/genomecrispr_20220110.csv --input-summary input/20220113/genomecrispr_hitsSum_20220110.csv
- Script:
addGenomeRNAi.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The GenomeRNAi dataset file (file name genomernai_*.csv) |
- Run:
node addGenomeRNAi.js --input input/20220113/genomernai_20220110.tsv
- Script:
addHuRI.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The HuRI dataset file (file name huri_*.csv) |
- Run:
node addHuRI.js --input input/20220113/huri_20220110.csv
- Script:
addInterpro.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The InterPro dataset file (file name interpro_*.csv) |
- Run:
node addInterpro.js --input input/20220113/interpro_20220110.csv
- Script:
addInvitae.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The Invitae dataset file (file name invitae_*.csv) |
- Run:
node addInvitae.js --input input/20220113/invitae_20220110.csv
- Script:
addMavedb.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The MaveDB dataset file (file name mavedb_*.csv) |
- Run:
node addMavedb.js --input input/20220113/mavedb_20220110.csv
- Script:
addOGEE.js
- Arguments:
Argument | Required | Description |
---|---|---|
input-entry | TRUE | The OGEE dataset file (file name ogee_gene_*.csv) |
input-summary | TRUE | The OGEE stats file (file name ogee_study_*.csv) |
- Run:
node addOGEE.js --input-entry input/20220113/ogee_gene_20220110.csv --input-summary input/20220113/ogee_study_20220110.csv
- Script:
addOMIM.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The OMIM dataset file (file name omim_*.csv) |
- Run:
node addOMIM.js --input input/20220113/omim_20220110.csv
- Script:
addOrphanet.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The Orphanet dataset file (file name orphanet_*.csv) |
- Run:
node addOrphanet.js --input input/20220113/orphanet_20220110.csv
- Script:
addOrthology.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The merged human homology datasets file (file name human_orthology_merged_*.csv) |
- Run:
node addOrthology.js --input input/20220113/human_orthology_merged_20220110.tsv
- Script:
addOverexpression.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The merged human homology datasets file (file name human_overexpression_*.csv) |
- Run:
node addOverexpression.js --input input/20220113/human_overexpression_20220110.csv
- Script:
addPharmGKB.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The PharmGKB dataset file (file name pharmgkb_*.csv) |
- Run:
node addPharmGKB.js --input input/20220113/pharmgkb_20220110.tsv
- Script:
addSecondaryStructure.js
- Arguments:
Argument | Required | Description |
---|---|---|
input | TRUE | The secondary structure dataset file (file name secondary_structure_*.csv) |
- Run:
node addSecondaryStructure.js --input input/20220113/secondary_strucuture_20220110.csv
- Script:
addPriority.js
- Arguments:
Argument | Required | Description |
---|---|---|
input-acmg | TRUE | The ACMG SF dataset file (file name acmg_list_*.csv) |
input-dais | TRUE | The DAIS dataset file (file name dais_list_*.csv) |
- Run:
node addPriority.js --input-acmg input/20220727/acmg_list_20220721.csv --input-dais input/20220727/dais_list_20220721.csv
- Script:
addBioGrid.js
- Arguments:
Argument | Required | Description |
---|---|---|
input-entry | TRUE | The BioGRID ORCS gene data file (file name biogrid_orcs_by_gene_*.csv) |
input-summary | TRUE | The BioGRID ORCS screen data file (file name biogrid_orcs_screen_info_*.csv) |
- Run:
node addBioGrid.js --input-entry input/20220727/biogrid_orcs_by_genes_20220721.csv --input-summary input/20220727/biogrid_orcs_screen_info_20220721.csv
- Script:
addStats.js
- Arguments:
Argument | Required | Description |
---|---|---|
input-database-versions | TRUE | The database versions JSON file (file name databaseVersions.json). This file should be in the current working directory. |
- Run:
node addStats.js --input-database-versions databaseVersions.json
Working directory:
mavequest-api
- Change database version if the database version was incremented in Prepare curated data.
- Go to
controllers/utils.js
- Update the
dbKinds
constant to the the same version number as theutils.js
inmavequest-importer
folder.
- Go to
- Increment version number
- Go to
package.json
- Increment version number. For example, the current version is
2.14.2
and the next version will be2.14.3
- Go to
- Push API code
- Push code changes into the
master
branch on themavequest-api
Github repo - Create a pull request to merge code from the
master
branch into theprod
branch - Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.
- Push code changes into the
Working directory:
mavequest-importer
::I recommend you keep one previous version in case you need to roll back to it. For example, if you just updated database to V6, you should keep V5.::
- Script:
bulkDelete.js
- Arguments:
Argument | Required | Description |
---|---|---|
kind | TRUE | Specify the “kind” to delete from. Kinds are defined in utils.js . Check Prepare curated data for examples above. |
delete-all | TRUE | Delete everything for a kind in the datastore |
- Run:
# Example: delete database V3 from Google Cloud Datastore
node bulkDelete.js --kind Gene_V3 --delete-all
node bulkDelete.js --kind Assay_V3 --delete-all
node bulkDelete.js --kind Interactome_V3 --delete-all
node bulkDelete.js --kind Phenotype_V3 --delete-all
node bulkDelete.js --kind Clinical_Interests_V3 --delete-all
node bulkDelete.js --kind Stats_V3 --delete-all
- The
sitemap.xml
is a file that helps search engines (e.g. Google) to better crawl and index the website. We need to generate a new site map file each time after the database is updated.- Script:
bulkDelete.js
- Arguments: no arguments
- Run:
- Script:
node updateSitemap.js
- Copy the generated
sitemap.xml
file to thepublic
folder in the front-end repository:mavequest-front-end/public
- Push the updated
sitemap.xml
to Google cloud- Push code changes into the
v2
branch on themavequest-front-end
Github repo - Create a pull request to merge code from the
v2
branch into theprod
branch - Monitor the Github Actions page (https://github.com/rothlab/mavequest-api/actions) to make sure the automatic deployment script successfully deploys the code into Google cloud.
- Push code changes into the