Skip to content

Commit 1133609

Browse files
authored
Merge pull request #114 from ocean-tracking-network/JoyLiu
Adding in because it has been reviewed
2 parents ca5bb2b + 965bc4a commit 1133609

11 files changed

+209
-12
lines changed

_episodes/08_tag_metadata.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,8 @@ Often formatting errors occur in the information about the tag. Pay close attent
9797

9898
The metadata template [available here](https://members.oceantrack.org/data/data-collection) has a `Sample Data Row` as an example of properly-formatted metadata, along with the `Data Dictionary` sheet which contains detailed expectations for each column. Refer back to these often. We have also included some recommendations for filling in the tag metadata template on our [FAQ page](https://members.oceantrack.org/faq). Here are some guidelines:
9999

100-
- Animals with >1 associated tag (sensors, or double-tagging): add one line PER `TRANSMITTER ID` into the Tag Metadata form. The `ANIMAL_ID` column, or the `TAG_SERIAL_NUMBER` column **must** be the same between the rows in order to link those two (or more) records together.
100+
- Animals with >1 associated tag (sensors, or double-tagging): add one line PER `TRANSMITTER ID` into the Tag Metadata form. The `ANIMAL_ID` column, or the `TAG_SERIAL_NUMBER` column **must** be the same between the rows in order to link those 2 (or more) records together. Explanations: Tag `TAG_CODE_SPACE` (this is the "protocol", and is available from tag specifications) can be formatted like "A69-1303" or "R64K" depending on the manufacturer. When a tag has sensors, it needs 1 line in the tag metadata per sensor. Each line should be nearly identical, but have different `TAG_ID_CODEs` (each associated with a sensor).Records with the same `TAG_SERIAL_NUMBER` and/or `ANIMAL_ID` will be recognized as 1 tag in our database.
101+
101102
- Animals with anchor tags (ie: FLOY, spaghetti, streamer, dart, t-bar tags): ensure the `TAG_TYPE` column = `ANCHOR`. You may leave the following columns empty: `tag_manufacturer`, `tag_model`, `tag_id_code`, `tag_code_space` and `est_tag_life`.
102103
- Animals with satellite tags: ensure the `TAG_TYPE` column = `SATELLITE`. You may leave the following columns empty: `tag_id_code` and `tag_code_space`.
103104

@@ -137,7 +138,7 @@ This cell will now complete the first round of Quality Control checks.
137138
The output will have useful information:
138139
- Is the sheet formatted correctly? Correct column names, datatypes in each column etc.
139140
- Are either the `animal_id` or `tag_serial_number` columns completed?
140-
- Are there any `harvest_date` values in the metadata? Are they all after the `utc_release_date_time`?
141+
- Are there any `harvest_date` values in the metadata? Are they all after the `utc_release_date_time`? **Note:In our metadata we use the harvest_date column to indicate when the tag was removed from the fist animal before being re-used.**
141142
- Is the information about the animal formatted according to the Data Dictionary?
142143
- Are there any tags which are used twice in the same sheet?
143144
- Are there potential transcription errors in the `tag_code_space`? Ex: drag-and-drop errors from Excel
@@ -228,6 +229,11 @@ True
228229
~~~
229230
{: .language-plaintext .example}
230231

232+
#### Find Raw Data Table in DB (`schema.c_tag_meta_YYYY_MM`)
233+
- This table will includes all the OTN compulsory columns for tag metadata as well as the ones the researcher includes. But only OTN compulsory columns are QCed: `select * from schema.c_tag_meta_YYYY_MM where tag_serial_number ='xxxxxx'`
234+
- Cache Tables (`schema.animalcache_YYYY_MM ` & `schema.tagcache_YYYY_MM`)
235+
- These are the intermediate tables in the tag process
236+
- These two types of tables grab the necessary information from the raw table and splits it into two intermediate tables: One is the animal cache which contains all the information related to the tagged animals in this project on YYYY_MM. Another is tagcache related to the tag information. Both tables contain release locations, release date, project code, institution, etc.
231237

232238
#### Task list checkpoint
233239

@@ -446,6 +452,9 @@ The Nodebook will indicate the sheet had passed quality control by adding a ✔
446452

447453
If there are any errors go into database and fix the cache tables themselves, and re-run the cell.
448454

455+
#### Find Cache Tables in DB (`schema.animalcache_YYYY_MM ` & `schema.tagcache_YYYY_MM`)
456+
- These are the intermediate tables in the tag process
457+
- These two types of tables grab the necessary information from the raw table and splits it into two intermediate tables: One is the animal cache which contains all the information related to the tagged animals in this project on YYYY_MM. Another is tagcache related to the tag information. Both tables contain release locations, release date, project code, institution, etc.
449458

450459
#### Task list checkpoint
451460

@@ -496,6 +505,10 @@ The Nodebook will indicate the sheet had passed quality control by adding a ✔
496505

497506
If there are any errors, contact the researcher to scope potential data fixes, then open a DB-Fix Ticket, and use the Database Fix Notebooks to resolve the issues.
498507

508+
#### Find OTN Tables in DB (`schema.otn_animals` & `schema.otn_transmitters`)
509+
- Similar to the Cache Tables, these 2 OTN tables will contain all animal & tag in this projects across all time.
510+
- An example query: `select * from schema.otn_transmitters ot where catalognumber = 'XXXXX'`
511+
499512

500513
#### Task list checkpoint
501514

@@ -518,3 +531,6 @@ Then, please email a copy of this file to the researcher who submitted it, so th
518531
Finally, the Issue can be passed off to an OTN-analyst for final verification in the database.
519532

520533
{% include links.md %}
534+
535+
536+

_episodes/09_deploy_metadata.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ flowchart LR
2121
style tag_start fill:#00FF00,stroke:#00FF00,stroke-width:4px
2222
get_meta --> gitlab(Create <br />Gitlab <br />issue)
2323
gitlab --> inspect(Visually <br />inspect)
24-
inspect --> nodebook(Process and verify <br />with nodebooks)
24+
inspect --> nodebook(Process and verify <br />with nodebooks)
2525
nodebook --> plone(Add metadata <br />to repository folder)
2626
plone --> otn(Pass to <br />OTN)
2727
otn --> end2(( ))
@@ -99,7 +99,7 @@ The metadata template [available here](https://members.oceantrack.org/data/data-
9999
- When more than one instrument is deployed, downloaded, or recovered at the same station, enter each one on a separate line using the same `OTN_ARRAY` and `STATION_NO`.
100100
- When sentinel tags are co-deployed with receivers, their information can be added to `TRANSMITTER` and `TRANSMIT_MODEL` columns, on the same line as the receiver deployment.
101101
- If a sentinel tag is deployed alone then a new line for that station, with as much information as possible, is added.
102-
- When stations are moved to a new location, but the researcher wants to keep the same station names, we often recommend appending ‘_yyyy’ to the station name, but this change might be forgotten the next time they submit metadata. So, we need to manually compare between the database and the metadata for special cases like this. Researchers may also submit station names with special characters which have been previously corrected and loaded to the database We need to make sure those same changes are reflected in the new metadata.
102+
- When stations are moved to a new location, but the researcher wants to keep the same station names, we often recommend appending ‘_yyyy’ to the station name, but this change might be forgotten the next time they submit metadata. So, we need to manually compare between the database and the metadata for special cases like this. Researchers may also submit station names with special characters which have been previously corrected and loaded to the database We need to make sure those same changes are reflected in the new metadata.
103103
- When an instrument is deemed lost, a value of `l` or `lost` should be entered in the "recovered" field; if the instrument is found, this can be updated by changing the recovery field to `f` or `found` and resubmitting the metadata sheet.
104104
- Every time an instrument is brought to the surface, enter `y` to indicate it was successfully recovered, even if only for downloading and redeployment. A new line for the redeployment is required.
105105

@@ -180,7 +180,9 @@ The output will have useful information:
180180
- Compared to the `stations` table in the database, are the station names correct? Have stations "moved" location? Are the reported bottom_depths significantly different (check for possible `ft` vs `m` vs `ftm` errors).
181181
- Are all recovery dates after the deployment dates?
182182
- Are all the provided `ins_model_no` values present in the `obis.instrument_models` table? If not, please check the records in the `obis.instrument_models` and the source file to confirm there are no typos. If this is a new model which has never been used before, use the `add instrument_models` Nodebook to add the new instrument model.
183-
- Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database?
183+
- Do all transceivers/test tags have their transmitters provided? Do these match any manufacturer specifications we have in the database? **Note: For VR2AR and VR2Tx type receivers, researcher can record internal transmitters under the ‘transmitter’ column (example format: A69-1303-12345). We will then associate these ‘detections’ with the receiver! Though it's not a compulsory information, it does help us distinguish real and tesing detections for your project, which can reduce the risk of mismatching.**
184+
185+
184186
- Are there any overlapping deployments (one serial number deployed at multiple locations for a period of time)?
185187
- Are all the deployments within the Bounding Box of the project. If the bounding box needs to be expanded to include the stations, you can use the `Square Draw Tool` to re-draw the bounding box until you are happy with it. Once all stations are drawn inside the bounding box, press the `Adjust Bounding Box` button to save the results.
186188
- Are there possible gaps in the metadata, based on previously-loaded `detections` files? This will be investigated in the `Detections-3b` Nodebook if you need more details.
@@ -244,6 +246,11 @@ In GitLab, this task can be completed at this stage:
244246

245247
`- [ ] - NAME verify raw table ("deploy" notebook)`
246248

249+
#### Find Raw Data Table in DB (`schema.c_shortform_YYYY_MM`)
250+
- This table will includes all the OTN compulsory columns for receiver metadata as well as the ones the researcher includes. But only OTN compulsory columns are QCed.
251+
- An example query: `select * from schema.c_shortform_2020_04 cs where ins_model_no ilike '%CTD%'`
252+
253+
247254
### Loading Stations Records
248255

249256
**STOP** - confirm there is no Push currently ongoing. If a Push is ongoing, you must wait for it to be completed before processing beyond this point
@@ -270,6 +277,7 @@ Added XX new stations to schema.moorings
270277
If the `stations` and `moorings` tables are not in sync, you will need to compare the two tables for differences and possibly update one or the other.
271278

272279

280+
273281
#### Task List Checkpoint
274282

275283
In GitLab, this task can be completed at this stage:
@@ -300,6 +308,17 @@ In GitLab, this task can be completed at this stage:
300308

301309
`- [ ] - verify stations ("deploy" notebook)`
302310

311+
312+
#### Find Station Tables in DB(`schema.stations` & `schema.rcvr_locations`)
313+
- These Station tables are intermediate tables. They grab the necessary information from the raw table.
314+
- `schema.stations` contains all distinct the deployment stations information from that schema across all years. Column `date` and `intended_lon`, `intended_lat` represent the first time and coordinate (lon,lat) this station was added.
315+
- Note: In `schema.stations` table, all stations have the distinct names with distinct coordinates. The notebook will show errors, if the researcher put different coordinates for the same location or put the same coordinate for different locations. Here we can use corresponding DB fix tool to change station names.
316+
- An example query: `select * from schema.stations where station_name in ('A','B')`
317+
318+
![OTN Database - path of data through the system](../fig/unique_station_names_edit.png)
319+
320+
321+
303322
### Load to rcvr_locations
304323
Once the `station` table is verified, the receiver deployment records can now be promoted to the "intermediate" `rcvr_locations` table.
305324

@@ -355,6 +374,13 @@ In GitLab, this task can be completed at this stage:
355374

356375
`- [ ] - verify rcvr_locations ("deploy" notebook)`
357376

377+
#### Find schema.rcvr_locations tables in DB
378+
- These tables are also imtermediate tables which contains all deployment information for all receivers from that schema across all years. Station is treated as an area, receivers can be deployed at the same station with different coordinates. Column `deploy_date` and `deploy_lon`, `deploy_lat` shows each receiver's deployment date and coordinates.
379+
- Note: In `schema.rcvr_locations` table, you may see the same station has different coordinates. But the notebook will show errors if the same receiver's overlapping deployments. Here we can look and this table for information we need to change and use corresponding DB fix tool.
380+
- An example query:`select * from schema.rcvr_locations rl where rl.rcv_serial_no = 'XXXXX'`
381+
382+
![OTN Database - path of data through the system](../fig/deployment_overlapping_edit.png)
383+
358384
### Load Transmitter Records to Moorings
359385

360386
The `transmitter` values associated with transceivers, co-deployed sentinel tags, or stand-alone test tags will be loaded to the `moorings` table in this section. Existing transmitter records will also be updated, if relevant.
@@ -409,6 +435,12 @@ The Nodebook will indicate the table has passed quality control by adding a ✔
409435

410436
If there are any errors with records that have already been promoted to the `moorings` table, you will need to create a db fix ticket in Gitlab to correct the records in the database. You may need to contact the researcher before resolving the error.
411437

438+
#### Find Mooring Tables in DB
439+
- `schema.moorings` contains all receiver, transmitter, event information from this project. Note: the notebook will show errors if the same transmitter_ID has been used in different receivers. We can check this table to check more information on transmitter_ID and may need to use the corresponding DB fix tool to change transmitter_ID.
440+
- An example query: `select * from schema.moorings where basisofrecord = 'TRANSMITTER' and relationshiptype = 'STATION'`
441+
442+
![OTN Database - path of data through the system](../fig/ovelapping_transceivers_edit.png)
443+
412444

413445
#### Task List Checkpoint
414446

@@ -425,3 +457,11 @@ First: you should access the Repository folder in your browser and add the clean
425457
Finally, the GitLab ticket can be reassigned to an OTN analyst for final verification in the database.
426458

427459
{% include links.md %}
460+
461+
462+
463+
464+
465+
466+
467+

_episodes/10_Detections.md

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ Events-1 is responsible for loading receiver events files into raw tables. This
255255

256256
### Import cell
257257

258-
As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
258+
As in all Nodebooks, run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.
259259

260260
### User Inputs
261261

@@ -292,6 +292,10 @@ Reset(s): XX
292292
{: .language-plaintext .example}
293293

294294

295+
#### Find Raw Data Table in DB (`schema.c_events_YYYY_MM` & `schema.c_detections_YYYY_MM`)
296+
- The event table contains some environmental and receiver data for this project at this time, for e.g., temperature, depth, and battery.
297+
- The detection table contains detection information for this project at this time. Here you can see tags detected by each receiver through different times.
298+
295299
### Database Connection
296300

297301
You will have to edit one section: `engine = get_engine()`
@@ -446,6 +450,7 @@ collectioncode.c_detections_yyyy_mm table found.
446450
{: .language-plaintext .example}
447451

448452

453+
449454
### Create Missing Tables
450455

451456
Detections tables are only created on an as-needed basis. These cells will detect any tables you are missing and create them based on the years covered in the raw detection table (c_table). This will check all tables such as `detections_yyyy`, `sensor_match_yyyy` and `otn_detections_yyyy`.
@@ -557,16 +562,18 @@ In GitLab, this task can be completed at this stage:
557562

558563
`- [ ] - NAME verify detections_yyyy (looking for duplicates) ("detections-2" notebook)`
559564

565+
566+
560567
### Load sensors_match Tables by Year
561568

562569
For the last part of this Nodebook you will need to load the to the `sensor_match_YYYY` tables. This loads detections with sensor information into a project's `sensor_match_yyyy` tables. Later, these tables will aid in matching vendor specifications to resolve sensor tag values.
563570

564571
Output will appear like this:
565572

566573
~~~
567-
Inserting records from collectioncode.detections_2019 INTO sensor_match_2019... OK
574+
Inserting records from collectioncode.detections_YYYY INTO sensor_match_YYYY... OK
568575
Added XXX rows.
569-
Inserting records from collectioncode.detections_2021 INTO sensor_match_2021... OK
576+
Inserting records from collectioncode.detections_YYYY INTO sensor_match_YYYY... OK
570577
Added XXX rows.
571578
~~~
572579
{: .language-plaintext .example}
@@ -682,7 +689,7 @@ Once you have added your information, you can run the cell. Successful login is
682689
Auth password:········
683690
Connection Notes: None
684691
Database connection established
685-
Connection Type:postgresql Host:db.load.oceantrack.org Database:otnunit User:admin Node:OTN
692+
Connection Type:postgresql Host:db.for.your.org Database:your_db_name User:your_node_admin Node:Node
686693
~~~
687694
{: .language-plaintext .example}
688695

@@ -712,9 +719,9 @@ Once you are clear to continue loading you can run `create_detection_views`. Thi
712719
Output will look like:
713720

714721
~~~
715-
Creating view collectioncode.vw_detections_2020... OK
716-
Creating view collectioncode.vw_sentinel_2020... OK
717-
Creating view collectioncode.vw_detections_2021... OK
722+
Creating view collectioncode.vw_detections_YYYY... OK
723+
Creating view collectioncode.vw_sentinel_YYYY... OK
724+
Creating view collectioncode.vw_detections_YYYY... OK
718725
~~~
719726
{: .language-plaintext .example}
720727

@@ -973,6 +980,36 @@ In GitLab, this task can be completed at this stage:
973980

974981
This Nodebook will promote the events records from the intermediate `events` table to the final `moorings` records. Only use this Nodebook after adding the receiver records to the moorings table as this process is dependant on receiver records.
975982

983+
#### Find- Event & Detection Table (`schema.events` & `schema.detections_YYYY`)
984+
- These are intermediate tables which contain all events in this project across all years and detections in certain years.
985+
- For example, if a researcher want to know all spatial temperature data for a certain type of receiver in his project schema, they could use the query:
986+
```sql
987+
select
988+
rcv.otn_array, rcv.station_name, e.datetime as date, e.receiver, e."data", e.description, rcv.rcv_serial_no,
989+
rcv.deploy_date, rcv.recover_date, rcv.recover_ind, rcv.dep_lat, rcv.dep_long, rcv.the_geom, rcv.catalognumber
990+
from schema.events e
991+
left join schema.rcvr_locations rcv
992+
on model.f_end(e.receiver,'-') = model.f_end(rcv.rcv_serial_no) where
993+
strpos(e.receiver, 'VR4') = 1 and e.description = 'Temperature'
994+
```
995+
- As another example, if we want to check how many distinct transmitter are detected by a receiver 123456 in a project schema during 2023-11-10 to 2024-05-29, we could use the query:
996+
```sql
997+
SELECT DISTINCT transmitter
998+
FROM schema.detections_2023
999+
WHERE receiver ILIKE '%123456%'
1000+
AND datetime > '2023-11-10 00:00:00'
1001+
AND datetime < '2024-05-29 18:30:00'
1002+
UNION
1003+
SELECT DISTINCT transmitter
1004+
FROM sjrbl.detections_2024
1005+
WHERE receiver ILIKE '%123456%'
1006+
AND datetime > '2023-11-10 00:00:00'
1007+
AND datetime < '2024-05-29 18:30:00'
1008+
```
1009+
1010+
1011+
1012+
9761013
### Import cells and Database connections
9771014

9781015
As in all Nodebooks run the import cell to get the packages and functions needed throughout the notebook. This cell can be run without any edits.

0 commit comments

Comments
 (0)