I'm developing the 3.0 version of the monarch ui/website, and I've run into a limitation. @putmantime
{
"numFound": 177,
"docs": [
{
"id": "FlyBase:FBgn0029157",
"id_std": "FlyBase:FBgn0029157",
"id_eng": "FlyBase:FBgn0029157",
"id_kw": "FlyBase:FBgn0029157",
"prefix": "FlyBase",
"label": ["ssh"],
"label_std": ["ssh"],
"label_eng": ["ssh"],
"label_kw": ["ssh"],
"edges": 319,
"taxon": "NCBITaxon:7227",
"taxon_std": "NCBITaxon:7227",
"taxon_eng": "NCBITaxon:7227",
"taxon_kw": "NCBITaxon:7227",
"taxon_label": "Drosophila melanogaster",
"taxon_label_std": "Drosophila melanogaster",
"taxon_label_eng": "Drosophila melanogaster",
"taxon_label_kw": "Drosophila melanogaster",
"taxon_label_synonym": ["fruit fly", "Sophophora melanogaster"],
"taxon_label_synonym_std": ["fruit fly", "Sophophora melanogaster"],
"taxon_label_synonym_eng": ["fruit fly", "Sophophora melanogaster"],
"taxon_label_synonym_kw": ["fruit fly", "Sophophora melanogaster"],
"has_phenotype": false,
"category": ["gene", "sequence feature"],
"category_std": ["gene", "sequence feature"],
"category_eng": ["gene", "sequence feature"],
"category_kw": ["gene", "sequence feature"],
"synonym": [
"slingshot",
"Dmel\\CG6238",
"SSH",
"Ssh",
"MKP-like",
"Slingshot",
"CG6238-PA",
"Mkph",
"CG6238-PB",
"CG6238",
"MKP",
"CG6238-PC",
"CG6238-PD",
"ssh-PB",
"ssh-PA",
"ssh-PD",
"ssh-PC",
"l(3)01207",
"MAP-kinase-phosphatase"
],
"synonym_std": [
"slingshot",
"Dmel\\CG6238",
"SSH",
"Ssh",
"MKP-like",
"Slingshot",
"CG6238-PA",
"Mkph",
"CG6238-PB",
"CG6238",
"MKP",
"CG6238-PC",
"CG6238-PD",
"ssh-PB",
"ssh-PA",
"ssh-PD",
"ssh-PC",
"l(3)01207",
"MAP-kinase-phosphatase"
],
"synonym_eng": [
"slingshot",
"Dmel\\CG6238",
"SSH",
"Ssh",
"MKP-like",
"Slingshot",
"CG6238-PA",
"Mkph",
"CG6238-PB",
"CG6238",
"MKP",
"CG6238-PC",
"CG6238-PD",
"ssh-PB",
"ssh-PA",
"ssh-PD",
"ssh-PC",
"l(3)01207",
"MAP-kinase-phosphatase"
],
"synonym_kw": [
"slingshot",
"Dmel\\CG6238",
"SSH",
"Ssh",
"MKP-like",
"Slingshot",
"CG6238-PA",
"Mkph",
"CG6238-PB",
"CG6238",
"MKP",
"CG6238-PC",
"CG6238-PD",
"ssh-PB",
"ssh-PA",
"ssh-PD",
"ssh-PC",
"l(3)01207",
"MAP-kinase-phosphatase"
],
"equivalent_curie": [
"FB:FBgn0029157",
"NCBIGene:42986",
"NCBI-Gene:42986",
"NCBI.Gene:42986",
"Entrez:42986",
"Entrez.Gene:42986",
"EntrezGene:42986",
"Entrez-Gene:42986",
"Gene:42986",
"ENSEMBL:FBgn0029157"
],
"equivalent_curie_std": [
"FB:FBgn0029157",
"NCBIGene:42986",
"NCBI-Gene:42986",
"NCBI.Gene:42986",
"Entrez:42986",
"Entrez.Gene:42986",
"EntrezGene:42986",
"Entrez-Gene:42986",
"Gene:42986",
"ENSEMBL:FBgn0029157"
],
"equivalent_curie_eng": [
"FB:FBgn0029157",
"NCBIGene:42986",
"NCBI-Gene:42986",
"NCBI.Gene:42986",
"Entrez:42986",
"Entrez.Gene:42986",
"EntrezGene:42986",
"Entrez-Gene:42986",
"Gene:42986",
"ENSEMBL:FBgn0029157"
],
"equivalent_curie_kw": [
"FB:FBgn0029157",
"NCBIGene:42986",
"NCBI-Gene:42986",
"NCBI.Gene:42986",
"Entrez:42986",
"Entrez.Gene:42986",
"EntrezGene:42986",
"Entrez-Gene:42986",
"Gene:42986",
"ENSEMBL:FBgn0029157"
],
"leaf": true,
"_version_": 1696524917734899700,
"score": 117.35552
}
],
"facet_counts": {
"category": {
},
"taxon_label": {
"Sus scrofa": 25,
"Drosophila melanogaster": 21,
"Homo sapiens": 18,
"Mus musculus": 16,
"Bos taurus": 6,
"Saccharomyces cerevisiae S288C": 6,
"Xenopus tropicalis": 6,
"Danio rerio": 5,
"Gallus gallus": 4,
"Anolis carolinensis": 3,
"Canis lupus familiaris": 3,
"Felis catus": 3,
"Macaca mulatta": 3,
"Monodelphis domestica": 3,
"Ornithorhynchus anatinus": 3,
"Pan troglodytes": 3,
"Rattus norvegicus": 3,
"Takifugu rubripes": 3,
"Equus caballus": 2
}
},
"highlighting": {}
}
I'm developing the 3.0 version of the monarch ui/website, and I've run into a limitation. @putmantime
Here is an example response from the
/search/entity/{term}endpoint, searching "ssh":{ "numFound": 177, "docs": [ { "id": "FlyBase:FBgn0029157", "id_std": "FlyBase:FBgn0029157", "id_eng": "FlyBase:FBgn0029157", "id_kw": "FlyBase:FBgn0029157", "prefix": "FlyBase", "label": ["ssh"], "label_std": ["ssh"], "label_eng": ["ssh"], "label_kw": ["ssh"], "edges": 319, "taxon": "NCBITaxon:7227", "taxon_std": "NCBITaxon:7227", "taxon_eng": "NCBITaxon:7227", "taxon_kw": "NCBITaxon:7227", "taxon_label": "Drosophila melanogaster", "taxon_label_std": "Drosophila melanogaster", "taxon_label_eng": "Drosophila melanogaster", "taxon_label_kw": "Drosophila melanogaster", "taxon_label_synonym": ["fruit fly", "Sophophora melanogaster"], "taxon_label_synonym_std": ["fruit fly", "Sophophora melanogaster"], "taxon_label_synonym_eng": ["fruit fly", "Sophophora melanogaster"], "taxon_label_synonym_kw": ["fruit fly", "Sophophora melanogaster"], "has_phenotype": false, "category": ["gene", "sequence feature"], "category_std": ["gene", "sequence feature"], "category_eng": ["gene", "sequence feature"], "category_kw": ["gene", "sequence feature"], "synonym": [ "slingshot", "Dmel\\CG6238", "SSH", "Ssh", "MKP-like", "Slingshot", "CG6238-PA", "Mkph", "CG6238-PB", "CG6238", "MKP", "CG6238-PC", "CG6238-PD", "ssh-PB", "ssh-PA", "ssh-PD", "ssh-PC", "l(3)01207", "MAP-kinase-phosphatase" ], "synonym_std": [ "slingshot", "Dmel\\CG6238", "SSH", "Ssh", "MKP-like", "Slingshot", "CG6238-PA", "Mkph", "CG6238-PB", "CG6238", "MKP", "CG6238-PC", "CG6238-PD", "ssh-PB", "ssh-PA", "ssh-PD", "ssh-PC", "l(3)01207", "MAP-kinase-phosphatase" ], "synonym_eng": [ "slingshot", "Dmel\\CG6238", "SSH", "Ssh", "MKP-like", "Slingshot", "CG6238-PA", "Mkph", "CG6238-PB", "CG6238", "MKP", "CG6238-PC", "CG6238-PD", "ssh-PB", "ssh-PA", "ssh-PD", "ssh-PC", "l(3)01207", "MAP-kinase-phosphatase" ], "synonym_kw": [ "slingshot", "Dmel\\CG6238", "SSH", "Ssh", "MKP-like", "Slingshot", "CG6238-PA", "Mkph", "CG6238-PB", "CG6238", "MKP", "CG6238-PC", "CG6238-PD", "ssh-PB", "ssh-PA", "ssh-PD", "ssh-PC", "l(3)01207", "MAP-kinase-phosphatase" ], "equivalent_curie": [ "FB:FBgn0029157", "NCBIGene:42986", "NCBI-Gene:42986", "NCBI.Gene:42986", "Entrez:42986", "Entrez.Gene:42986", "EntrezGene:42986", "Entrez-Gene:42986", "Gene:42986", "ENSEMBL:FBgn0029157" ], "equivalent_curie_std": [ "FB:FBgn0029157", "NCBIGene:42986", "NCBI-Gene:42986", "NCBI.Gene:42986", "Entrez:42986", "Entrez.Gene:42986", "EntrezGene:42986", "Entrez-Gene:42986", "Gene:42986", "ENSEMBL:FBgn0029157" ], "equivalent_curie_eng": [ "FB:FBgn0029157", "NCBIGene:42986", "NCBI-Gene:42986", "NCBI.Gene:42986", "Entrez:42986", "Entrez.Gene:42986", "EntrezGene:42986", "Entrez-Gene:42986", "Gene:42986", "ENSEMBL:FBgn0029157" ], "equivalent_curie_kw": [ "FB:FBgn0029157", "NCBIGene:42986", "NCBI-Gene:42986", "NCBI.Gene:42986", "Entrez:42986", "Entrez.Gene:42986", "EntrezGene:42986", "Entrez-Gene:42986", "Gene:42986", "ENSEMBL:FBgn0029157" ], "leaf": true, "_version_": 1696524917734899700, "score": 117.35552 } ], "facet_counts": { "category": { }, "taxon_label": { "Sus scrofa": 25, "Drosophila melanogaster": 21, "Homo sapiens": 18, "Mus musculus": 16, "Bos taurus": 6, "Saccharomyces cerevisiae S288C": 6, "Xenopus tropicalis": 6, "Danio rerio": 5, "Gallus gallus": 4, "Anolis carolinensis": 3, "Canis lupus familiaris": 3, "Felis catus": 3, "Macaca mulatta": 3, "Monodelphis domestica": 3, "Ornithorhynchus anatinus": 3, "Pan troglodytes": 3, "Rattus norvegicus": 3, "Takifugu rubripes": 3, "Equus caballus": 2 } }, "highlighting": {} }Notice that
taxon_labelis being returned for facets, instead oftaxon(id). This is nice for displaying a list of taxon facets, but not for actually filtering by them, because the endpoint only supports filtering bytaxon(id), nottaxon_label.This requires the frontend to make a hard-coded label to id mapping for taxons. This duplicates information that we already have in biolink, is brittle, and is likely to get out of sync.
And yes, I can look up
taxonfromdocsby finding the correspondingtaxon_labelfield. However, then I would need to make sure all results are indocsso I have all the mappings, and that might go beyond the maxrows[per page] param.Possible solutions:
Support a
taxon_labelfilter parameter (in addition to thetaxonparameter) in the search endpoint. I guess this would be most useful if it was an exact match, rather than a fuzzy match. If there are multiple taxon ids that map to the same exact taxon label, then this option wouldn't be viable.Return an additional
taxonfield infacet_countswith all the information I need:id,label, andcount. This would leave thetaxon_labelfacet untouched so current applications using biolink don't suddenly break.Have some kind of
taxon_mapfield at the top level of the response so I can go from label to id easily. Though, I think this is pretty ugly... don't want to add a top level thing for a special exception for just one type of facet.