Replies: 14 comments
-
|
I endorse the structure for the filesystem that you're laying out here. Network is probably correct as the higher-order folder for derived dataset formats, groups like GLATOS combine data in a single omnibus workbook, other networks split them into separate files all needing their own examples. |
Beta Was this translation helpful? Give feedback.
-
|
@jdpye the first draft didn't respect the spacing of the outlined repo structure, so I put it into a code block. I also went into some sub-directories and shifted note to "v0.2". Is it still agreeable as it's now outlined? |
Beta Was this translation helpful? Give feedback.
-
|
This looks great @mhpob and the overall repo structure makes sense to me. The little telemetry-workflow repo I made to help students in the Cooke lab, follows a similar structure as you've put forth here allowing the user to find things based on vendor or networks which I like. |
Beta Was this translation helpful? Give feedback.
-
|
+1 for overall structure (structure around vendor/network). Our early scoping of a "characteristic" raw file set for GLATOS was rather daunting due to the number of possible combinations of receiver model, firmware, code map, transmitter options (e.g., various sensors), receiver options (e.g., internal transmitter settings, various receiver sensors), offload software, preference for small files, and desire to include files with errors/issues. Our next step (WIP) is to create a table/list of desired characteristics, then go out in search for files that meet each. |
Beta Was this translation helpful? Give feedback.
-
Great idea to include various iterations. Does seem rather daunting as it immediately makes this kind of project quite large... possibly beyond the capability of a GitHub repo. The benefit of an open repo would be the ability to crowd-source some of these files via pull requests while maintaining an open record of the transaction.
Currently the "searching" may be the bear that leans a lot on your time and energy. Might there be merit to putting the list out there via something like this and seeing what is submitted to you? |
Beta Was this translation helpful? Give feedback.
-
|
GitHub recommends repos smaller than 1 GB with a max of 5 GB due to performance. May also then wind up requiring some hands-on management and git-fu: |
Beta Was this translation helpful? Give feedback.
-
|
So I guess my question is which files, specifically, do you you/we really want here? E.g., the GLATOS data system has >20,000 VRL files. So identifying an optimal set will require first identifying desired characteristics/features. |
Beta Was this translation helpful? Give feedback.
-
|
That's a really critical question, and highlights that I completely left any transmitter examples off of that repo structure. My initial thought is that a complete data example library (individual files for receiver x transmitter x software options) is the holy grail, but the possibility of that is questionable at best.
What I think we're getting at is a metadata question -- can we design applicable metadata to not only log what we do have, but log its deficiencies? I.e., can we stand up something that's good-enough, but "not let the perfect be the enemy of the good"?
|
Beta Was this translation helpful? Give feedback.
-
|
I swear i was authoring a reply that featured the phrase 'don't let the perfect be the enemy of the good' and i let it languish in a tab. I agree that we should take the files we're currently leaning on for testing glatos / surimi / remora / TelemetryWorkflow / etc... and then we allow users to supply extra files that don't yet exist on a needs basis and roll them into the mix. If we outgrow a regular repo there's a GitHub LFS option or there are other WAF-ish things we could try. But for now this has the right mix of 'others can suggest updates' and 'we know how it versions things and generally how it works' to be a reasonable solve. |
Beta Was this translation helpful? Give feedback.
-
|
@jdpye I know I've already stepped all over your toes here, but would you accept PRs following the guidance above to start fleshing this thing out? |
Beta Was this translation helpful? Give feedback.
-
|
Also, since the OTNDC is basically one big metadata factory -- any views on what that structure should be? Tabular is human readable but maybe not the most efficient; XML is all over the place but it's not super approachable (at least to me); JSON might be a compromise that would also slot into an API; some CI/CD that takes one and creates the others? |
Beta Was this translation helpful? Give feedback.
-
|
I don't mind a PR one bit, I just picked a poor week to take vacation. :) I'm also a big yaml fan. |
Beta Was this translation helpful? Give feedback.
-
|
Possibly useful reference; R-centric: https://music.dataobservatory.eu/documents/open_music_europe/dataset-development/dataset-working-paper.html |
Beta Was this translation helpful? Give feedback.
-
|
Re: building a package in another repo based on changes in this one |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
After recent discussions with @jdpye, @chrisholbrook, @benjaminhlina, @franksmithxyz, and others regarding
surimi(#1),glatos, VR2 VRL and binary formats, VR3 formats, and others, it seems like there is appetite to create some sort of open repository of characteristic types. As this repository is still in concept stage, I want to put forth a few ideas and discussion points related to its intent and scope to see if it can fill this space.This should be viewed as a draft to be edited, added to, knocked back, or refuted outright -- nothing is critical and things noted here could already be in place elsewhere or merit separate, dedicated support. It will be clear to anyone reading that I am blatantly inserting my own hopes and dreams within the myopic view of my own experiences. I'll update everything below as comments come in if this becomes a worthwhile forum.
Statement of problem
Intended scope
drat/R-Universe repository (see Use as an external data repository? (R-specific) #1)Data types to be included (high-level)
glatosetnPossible structure (v0.2, mirroring the above)
Is it better to organize according to network rather than data type?
Beta Was this translation helpful? Give feedback.
All reactions