-
Notifications
You must be signed in to change notification settings - Fork 0
Description
@dagendresen Great to see this being done. I'm curious as to the relative merits of measurementOrFact being used to store the link and sequence versus, say, the GGBN extensions. In one example DNA barcode dataset I uploaded I used GGBN, so the sequences look like this https://api.gbif.org/v1/occurrence/1502684137/fragment:
"extensions": {
"http://data.ggbn.org/schemas/ggbn/terms/Amplification": [
{
"consensusSequence": "CCTTTATCTAGTATTTGGTGCTTGAGCTGGAATAGTAGGCACAGCCTTAAGCCTTCTCATTCGAGCAGAACTAAGCCAACCTGGCGCACTCTTAGGAGACGACCAAATCTATAATGTTATTGTTACTGCACATGCCTTCGTAATGATTTTCTTTATAGTAATGCCAATTCTAATCGGGGGGTTTGGAAACTGATTAGTTCCTCTCATGCTTGGAGCCCCTGATATGGCATTCCCTCGTATGAACAACATAAGCTTCTGATTACTCCCTCCGTCATTCCTCCTTTTACTAGCTTCTTCCGGAGTTGAGGCCGGAGCCGGGACAGGTTGAACTGTCTACCCCCCACTGTCTGGTAATCTAGCCCATGCGGGAGCATCAGTAGATTTAACCATCTTCTCCCTGCACCTGGCAGGTATTTCATCAATCCTAGGAGCAATCAACTTTATCACTACCATCATCAACATAAAACCCCCCGCTATCTCTCAATACCAAACTCCTTTATTTGTTTGGGCTGTTCTAATTACTGCCGTTCTTCTACTCCTATCTCTCCCAGTCCTAGCTGCTGGCATTACTATGCTCCTGACCGACCGAAATCTTAATACTACCTTCTTCGATCCCGCAGGAGGAGGAGACCCAATTCTTTACCAACACCTC",
"geneticAccessionNumber": "KP194104",
"marker": "COI-5P"
}
]
},
I don't know whether GGBN will be widely adopted, nor how much data like this GBIF is likely to get. It is also rather hidden in the current GBIF portal as it's not displayed in the HTML view, you have to go through the API.
I guess measurementOrFact has the advantage that the portal supports it already, so people can actually see the sequences (this opens up all sorts of interesting possibilities, such as GBIF analysing sequence data).
The other issue is duplication. As GBIF ingests more and more BOLD sequences, existing records will be duplicated. What if we linked those duplicates? In other words, not only say that this GBIF occurrence from a museum has this DNA barcode, but that DNA barcode is also in GBIF as occurrence xxx?