You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/08_tag_metadata.md
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,8 @@ Often formatting errors occur in the information about the tag. Pay close attent
97
97
98
98
The metadata template [available here](https://members.oceantrack.org/data/data-collection) has a `Sample Data Row` as an example of properly-formatted metadata, along with the `Data Dictionary` sheet which contains detailed expectations for each column. Refer back to these often. We have also included some recommendations for filling in the tag metadata template on our [FAQ page](https://members.oceantrack.org/faq). Here are some guidelines:
99
99
100
-
- Animals with >1 associated tag (sensors, or double-tagging): add one line PER `TRANSMITTER ID` into the Tag Metadata form. The `ANIMAL_ID` column, or the `TAG_SERIAL_NUMBER` column **must** be the same between the rows in order to link those two (or more) records together.
100
+
- Animals with >1 associated tag (sensors, or double-tagging): add one line PER `TRANSMITTER ID` into the Tag Metadata form. The `ANIMAL_ID` column, or the `TAG_SERIAL_NUMBER` column **must** be the same between the rows in order to link those 2 (or more) records together. Explanations: Tag `TAG_CODE_SPACE` (this is the "protocol", and is available from tag specifications) can be formatted like "A69-1303" or "R64K" depending on the manufacturer. When a tag has sensors, it needs 1 line in the tag metadata per sensor. Each line should be nearly identical, but have different `TAG_ID_CODEs` (each associated with a sensor).Records with the same `TAG_SERIAL_NUMBER` and/or `ANIMAL_ID` will be recognized as 1 tag in our database.
101
+
101
102
- Animals with anchor tags (ie: FLOY, spaghetti, streamer, dart, t-bar tags): ensure the `TAG_TYPE` column = `ANCHOR`. You may leave the following columns empty: `tag_manufacturer`, `tag_model`, `tag_id_code`, `tag_code_space` and `est_tag_life`.
102
103
- Animals with satellite tags: ensure the `TAG_TYPE` column = `SATELLITE`. You may leave the following columns empty: `tag_id_code` and `tag_code_space`.
103
104
@@ -137,7 +138,7 @@ This cell will now complete the first round of Quality Control checks.
137
138
The output will have useful information:
138
139
- Is the sheet formatted correctly? Correct column names, datatypes in each column etc.
139
140
- Are either the `animal_id` or `tag_serial_number` columns completed?
140
-
- Are there any `harvest_date` values in the metadata? Are they all after the `utc_release_date_time`?
141
+
- Are there any `harvest_date` values in the metadata? Are they all after the `utc_release_date_time`?**Note:In our metadata we use the harvest_date column to indicate when the tag was removed from the fist animal before being re-used.**
141
142
- Is the information about the animal formatted according to the Data Dictionary?
142
143
- Are there any tags which are used twice in the same sheet?
143
144
- Are there potential transcription errors in the `tag_code_space`? Ex: drag-and-drop errors from Excel
@@ -228,6 +229,11 @@ True
228
229
~~~
229
230
{: .language-plaintext .example}
230
231
232
+
#### Find Raw Data Table in DB (`schema.c_tag_meta_YYYY_MM`)
233
+
- This table will includes all the OTN compulsory columns for tag metadata as well as the ones the researcher includes. But only OTN compulsory columns are QCed: `select * from schema.c_tag_meta_YYYY_MM where tag_serial_number ='xxxxxx'`
- These are the intermediate tables in the tag process
236
+
- These two types of tables grab the necessary information from the raw table and splits it into two intermediate tables: One is the animal cache which contains all the information related to the tagged animals in this project on YYYY_MM. Another is tagcache related to the tag information. Both tables contain release locations, release date, project code, institution, etc.
231
237
232
238
#### Task list checkpoint
233
239
@@ -446,6 +452,9 @@ The Nodebook will indicate the sheet had passed quality control by adding a ✔
446
452
447
453
If there are any errors go into database and fix the cache tables themselves, and re-run the cell.
448
454
455
+
#### Find Cache Tables in DB (`schema.animalcache_YYYY_MM ` & `schema.tagcache_YYYY_MM`)
456
+
- These are the intermediate tables in the tag process
457
+
- These two types of tables grab the necessary information from the raw table and splits it into two intermediate tables: One is the animal cache which contains all the information related to the tagged animals in this project on YYYY_MM. Another is tagcache related to the tag information. Both tables contain release locations, release date, project code, institution, etc.
449
458
450
459
#### Task list checkpoint
451
460
@@ -496,6 +505,10 @@ The Nodebook will indicate the sheet had passed quality control by adding a ✔
496
505
497
506
If there are any errors, contact the researcher to scope potential data fixes, then open a DB-Fix Ticket, and use the Database Fix Notebooks to resolve the issues.
498
507
508
+
#### Find OTN Tables in DB (`schema.otn_animals` & `schema.otn_transmitters`)
509
+
- Similar to the Cache Tables, these 2 OTN tables will contain all animal & tag in this projects across all time.
510
+
- An example query: `select * from schema.otn_transmitters ot where catalognumber = 'XXXXX'`
511
+
499
512
500
513
#### Task list checkpoint
501
514
@@ -518,3 +531,6 @@ Then, please email a copy of this file to the researcher who submitted it, so th
518
531
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
@@ -99,7 +99,7 @@ The metadata template [available here](https://members.oceantrack.org/data/data-
99
99
- When more than one instrument is deployed, downloaded, or recovered at the same station, enter each one on a separate line using the same `OTN_ARRAY` and `STATION_NO`.
100
100
- When sentinel tags are co-deployed with receivers, their information can be added to `TRANSMITTER` and `TRANSMIT_MODEL` columns, on the same line as the receiver deployment.
101
101
- If a sentinel tag is deployed alone then a new line for that station, with as much information as possible, is added.
102
-
- When stations are moved to a new location, but the researcher wants to keep the same station names, we often recommend appending ‘_yyyy’ to the station name, but this change might be forgotten the next time they submit metadata. So, we need to manually compare between the database and the metadata for special cases like this. Researchers may also submit station names with special characters which have been previously corrected and loaded to the database We need to make sure those same changes are reflected in the new metadata.
102
+
- When stations are moved to a new location, but the researcher wants to keep the same station names, we often recommend appending ‘_yyyy’ to the station name, but this change might be forgotten the next time they submit metadata. So, we need to manually compare between the database and the metadata for special cases like this. Researchers may also submit station names with special characters which have been previously corrected and loaded to the database We need to make sure those same changes are reflected in the new metadata.
103
103
- When an instrument is deemed lost, a value of `l` or `lost` should be entered in the "recovered" field; if the instrument is found, this can be updated by changing the recovery field to `f` or `found` and resubmitting the metadata sheet.
104
104
- Every time an instrument is brought to the surface, enter `y` to indicate it was successfully recovered, even if only for downloading and redeployment. A new line for the redeployment is required.
105
105
@@ -180,7 +180,9 @@ The output will have useful information:
180
180
- Compared to the `stations` table in the database, are the station names correct? Have stations "moved" location? Are the reported bottom_depths significantly different (check for possible `ft` vs `m` vs `ftm` errors).
181
181
- Are all recovery dates after the deployment dates?
182
182
- Are all the provided `ins_model_no` values present in the `obis.instrument_models` table? If not, please check the records in the `obis.instrument_models` and the source file to confirm there are no typos. If this is a new model which has never been used before, use the `add instrument_models` Nodebook to add the new instrument model.
183
-
- Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database?
183
+
- Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database? **Note: For VR2AR and VR2Tx type receivers, researcher can record internal transmitters under the ‘transmitter’ column (example format: A69-1303-12345). We will then associate these ‘detections’ with the receiver! Though it's not a compulsory information, it does help us distinguish real and tesing detections for your project, which can reduce the risk of mismatching.**
184
+
185
+
184
186
- Are there any overlapping deployments (one serial number deployed at multiple locations for a period of time)?
185
187
- Are all the deployments within the Bounding Box of the project. If the bounding box needs to be expanded to include the stations, you can use the `Square Draw Tool` to re-draw the bounding box until you are happy with it. Once all stations are drawn inside the bounding box, press the `Adjust Bounding Box` button to save the results.
186
188
- Are there possible gaps in the metadata, based on previously-loaded `detections` files? This will be investigated in the `Detections-3b` Nodebook if you need more details.
@@ -244,6 +246,11 @@ In GitLab, this task can be completed at this stage:
244
246
245
247
`- [ ] - NAME verify raw table ("deploy" notebook)`
246
248
249
+
#### Find Raw Data Table in DB (`schema.c_shortform_YYYY_MM`)
250
+
- This table will includes all the OTN compulsory columns for receiver metadata as well as the ones the researcher includes. But only OTN compulsory columns are QCed.
251
+
- An example query: `select * from schema.c_shortform_2020_04 cs where ins_model_no ilike '%CTD%'`
252
+
253
+
247
254
### Loading Stations Records
248
255
249
256
**STOP** - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point
@@ -270,6 +277,7 @@ Added XX new stations to schema.moorings
270
277
If the `stations` and `moorings` tables are not in sync, you will need to compare the two tables for differences and possibly update one or the other.
271
278
272
279
280
+
273
281
#### Task List Checkpoint
274
282
275
283
In GitLab, this task can be completed at this stage:
@@ -300,6 +308,17 @@ In GitLab, this task can be completed at this stage:
300
308
301
309
`- [ ] - verify stations ("deploy" notebook)`
302
310
311
+
312
+
#### Find Station Tables in DB(`schema.stations` & `schema.rcvr_locations`)
313
+
- These Station tables are intermediate tables. They grab the necessary information from the raw table.
314
+
- `schema.stations` contains all distinct the deployment stations information from that schema across all years. Column `date` and `intended_lon`, `intended_lat` represent the first time and coordinate (lon,lat) this station was added.
315
+
- Note: In `schema.stations` table, all stations have the distinct names with distinct coordinates. The notebook will show errors, if the researcher put different coordinates for the same location or put the same coordinate for different locations. Here we can use corresponding DB fix tool to change station names.
316
+
- An example query: `select * from schema.stations where station_name in ('A','B')`
317
+
318
+

319
+
320
+
321
+
303
322
### Load to rcvr_locations
304
323
Once the `station` table is verified, the receiver deployment records can now be promoted to the "intermediate" `rcvr_locations` table.
305
324
@@ -355,6 +374,13 @@ In GitLab, this task can be completed at this stage:
- These tables are also imtermediate tables which contains all deployment information for all receivers from that schema across all years. Station is treated as an area, receivers can be deployed at the same station with different coordinates. Column `deploy_date` and `deploy_lon`, `deploy_lat` shows each receiver's deployment date and coordinates.
379
+
- Note: In `schema.rcvr_locations` table, you may see the same station has different coordinates. But the notebook will show errors if the same receiver's overlapping deployments. Here we can look and this table for information we need to change and use corresponding DB fix tool.
380
+
- An example query:`select * from schema.rcvr_locations rl where rl.rcv_serial_no = 'XXXXX'`
381
+
382
+

383
+
358
384
### Load Transmitter Records to Moorings
359
385
360
386
The `transmitter` values associated with transceivers, co-deployed sentinel tags, or stand-alone test tags will be loaded to the `moorings` table in this section. Existing transmitter records will also be updated, if relevant.
@@ -409,6 +435,12 @@ The Nodebook will indicate the table has passed quality control by adding a ✔
409
435
410
436
If there are any errors with records that have already been promoted to the `moorings` table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.
411
437
438
+
#### Find Mooring Tables in DB
439
+
- `schema.moorings` contains all receiver, transmitter, event information from this project. Note: the notebook will show errors if the same transmitter_ID has been used in different receivers. We can check this table to check more information on transmitter_ID and may need to use the corresponding DB fix tool to change transmitter_ID.
440
+
- An example query: `select * from schema.moorings where basisofrecord = 'TRANSMITTER' and relationshiptype = 'STATION'`
441
+
442
+

443
+
412
444
413
445
#### Task List Checkpoint
414
446
@@ -425,3 +457,11 @@ First: you should access the Repository folder in your browser and add the clean
425
457
Finally, the GitLab ticket can be reassigned to an OTN analyst for final verification in the database.
Copy file name to clipboardExpand all lines: _episodes/10_Detections.md
+44-7Lines changed: 44 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -255,7 +255,7 @@ Events-1 is responsible for loading receiver events files into raw tables. This
255
255
256
256
### Import cell
257
257
258
-
As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
258
+
As in all Nodebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
259
259
260
260
### User Inputs
261
261
@@ -292,6 +292,10 @@ Reset(s): XX
292
292
{: .language-plaintext .example}
293
293
294
294
295
+
#### Find Raw Data Table in DB (`schema.c_events_YYYY_MM` & `schema.c_detections_YYYY_MM`)
296
+
- The event table contains some environmental and receiver data for this project at this time, for e.g., temperature, depth, and battery.
297
+
- The detection table contains detection information for this project at this time. Here you can see tags detected by each receiver through different times.
298
+
295
299
### Database Connection
296
300
297
301
You will have to edit one section: `engine = get_engine()`
Detections tables are only created on an as-needed basis. These cells will detect any tables you are missing and create them based on the years covered in the raw detection table (c_table). This will check all tables such as `detections_yyyy`, `sensor_match_yyyy` and `otn_detections_yyyy`.
@@ -557,16 +562,18 @@ In GitLab, this task can be completed at this stage:
557
562
558
563
`- [ ] - NAME verify detections_yyyy (looking for duplicates) ("detections-2" notebook)`
559
564
565
+
566
+
560
567
### Load sensors_match Tables by Year
561
568
562
569
For the last part of this Nodebook you will need to load the to the `sensor_match_YYYY` tables. This loads detections with sensor information into a project's `sensor_match_yyyy` tables. Later, these tables will aid in matching vendor specifications to resolve sensor tag values.
563
570
564
571
Output will appear like this:
565
572
566
573
~~~
567
-
Inserting records from collectioncode.detections_2019 INTO sensor_match_2019... OK
574
+
Inserting records from collectioncode.detections_YYYY INTO sensor_match_YYYY... OK
568
575
Added XXX rows.
569
-
Inserting records from collectioncode.detections_2021 INTO sensor_match_2021... OK
576
+
Inserting records from collectioncode.detections_YYYY INTO sensor_match_YYYY... OK
570
577
Added XXX rows.
571
578
~~~
572
579
{: .language-plaintext .example}
@@ -682,7 +689,7 @@ Once you have added your information, you can run the cell. Successful login is
@@ -712,9 +719,9 @@ Once you are clear to continue loading you can run `create_detection_views`. Thi
712
719
Output will look like:
713
720
714
721
~~~
715
-
Creating view collectioncode.vw_detections_2020... OK
716
-
Creating view collectioncode.vw_sentinel_2020... OK
717
-
Creating view collectioncode.vw_detections_2021... OK
722
+
Creating view collectioncode.vw_detections_YYYY... OK
723
+
Creating view collectioncode.vw_sentinel_YYYY... OK
724
+
Creating view collectioncode.vw_detections_YYYY... OK
718
725
~~~
719
726
{: .language-plaintext .example}
720
727
@@ -973,6 +980,36 @@ In GitLab, this task can be completed at this stage:
973
980
974
981
This Nodebook will promote the events records from the intermediate `events` table to the final `moorings` records. Only use this Nodebook after adding the receiver records to the moorings table as this process is dependant on receiver records.
- These are intermediate tables which contain all events in this project across all years and detections in certain years.
985
+
- For example, if a researcher want to know all spatial temperature data for a certain type of receiver in his project schema, they could use the query:
- As another example, if we want to check how many distinct transmitter are detected by a receiver 123456 in a project schema during 2023-11-10 to 2024-05-29, we could use the query:
996
+
```sql
997
+
SELECT DISTINCT transmitter
998
+
FROMschema.detections_2023
999
+
WHERE receiver ILIKE '%123456%'
1000
+
AND datetime >'2023-11-10 00:00:00'
1001
+
AND datetime <'2024-05-29 18:30:00'
1002
+
UNION
1003
+
SELECT DISTINCT transmitter
1004
+
FROMsjrbl.detections_2024
1005
+
WHERE receiver ILIKE '%123456%'
1006
+
AND datetime >'2023-11-10 00:00:00'
1007
+
AND datetime <'2024-05-29 18:30:00'
1008
+
```
1009
+
1010
+
1011
+
1012
+
976
1013
### Import cells and Database connections
977
1014
978
1015
As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
0 commit comments