-
Notifications
You must be signed in to change notification settings - Fork 0
Description
There seems to be a discrepancy in the data coming from gfdata::get_survey_sets() and gfdata::get_ll_hook_data(). From get_ll_hook_data(), it looks like count_target_species and count_non_target_species are not always correct. And I am unclear of where bait counts come from for these fishing_event_id values that correspond with these incongruent values.
Setup data to combine hook counts (baited, empty, total) with survey data
library(dplyr)
ye_survey <- gfdata::get_survey_sets(species = "yelloweye rockfish", ssid = c(39, 40))
ye_hook_dat <-
gfdata::get_ll_hook_data(species = "yelloweye rockfish", ssid = c(39, 40)) |>
mutate(total_hooks = count_target_species + count_non_target_species +
count_bait_only + count_empty_hooks - count_bent_broken)
dat <- left_join(ye_hook_dat, ye_survey)
Expected results
1. Number of rows from get_survey_sets() and get_ll_hook_data() should be equal
2. total_hooks should equal hook_count
3. count_target_species should equal catch_count
Current results
- More rows are returned from
get_ll_hook_data()thanget_survey_sets().
From what I can tell, I think that this is because of a difference in checking for usable sets in get_ll_hook_data.
nrow(ye_survey) - nrow(ye_hook_dat)
dat |> filter(is.na(catch_count))
db <- "GFBioSQL"
# check first 5 fishing_event_id values that are extra in ye_hook_dat
use_check <- gfdata::run_sql(db,
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID,
C.SPECIES_CODE, C.CATCH_COUNT,
LS.LGLSP_HOOK_COUNT, LS.LGLSP_HOOKS_SET_COUNT, LS.LGLSP_HOOKS_LOST_COUNT, LS.USABILITY_CODE,
U.USABILITY_DESC
FROM FISHING_EVENT_CATCH FEC
INNER JOIN CATCH C ON
FEC.CATCH_ID = C.CATCH_ID
INNER JOIN LONGLINE_SPECS LS ON
FEC.FISHING_EVENT_ID = LS.FISHING_EVENT_ID
INNER JOIN USABILITY U ON
LS.USABILITY_CODE = U.USABILITY_CODE
LEFT JOIN HOOK_SPECS HS ON
FEC.FISHING_EVENT_ID = HS.FISHING_EVENT_ID
WHERE FEC.FISHING_EVENT_ID IN (309481, 309483, 309514, 309516, 309519, 309521);")
# Includes usability codes not in (0, 1, 2, 6)
glimpse(use_check)
**2. total_hooks does not equal hook_count (from get_survey_sets())in some cases. hook_count looks like the correct value. **
3. catch_count which comes from get_survey_sets() looks like the correct value. While catch information from get_ll_hook_data() looks wrong.
Some examples of 2 and 3:
fishing_event_id == 2193813: count of target species (count_non_target_species) is 0, but should be 3fishing_event_id == 309482: count of target species is 15 but should be 16fishing_event_id == 1722030: count of target species is 19 but should be 18fishing_event_id == 1722028: count of target species is 2 but should be 1fishing_event_id == 4590043: count of non_target is 0, but 141 dogfish were caught, which looks like whytotal_hooksis incorrect
count_diffs <- dat |>
drop_na(catch_count) |> # ignore the usability code issue
mutate(c_diff = count_target_species - catch_count,
h_diff = total_hooks - hook_count) |>
select(fishing_event_id, count_non_target_species, count_target_species,
count_bait_only, count_empty_hooks, count_bent_broken, total_hooks,
fishing_event_id, catch_count, hook_count,
c_diff, h_diff) |> # simplify table view
rename(non_target = 'count_non_target_species', target = 'count_target_species',
bait = 'count_bait_only', empty = 'count_empty_hooks',
bent = 'count_bent_broken') # simplify table view
count_diffs |>
arrange(-abs(c_diff), -abs(h_diff)) |> # show big diffs at top of table
slice(1:5) |>
View()
catch_dat <- gfdata::run_sql(db,
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID,
C.SPECIES_CODE, C.CATCH_COUNT, C.CATCH_WEIGHT,
S.SPECIES_COMMON_NAME
FROM FISHING_EVENT_CATCH FEC
INNER JOIN CATCH C ON
FEC.CATCH_ID = C.CATCH_ID
LEFT OUTER JOIN SPECIES S ON
C.SPECIES_CODE = S.SPECIES_CODE
WHERE FEC.FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043)")
View(catch_dat)
4. When confirming that the bait and empty counts are correct, I cannot find the fishing_event_ids from the example above in HOOK_SPECS, so where is the count_bait_only coming from in get_ll_hook_data()? I might just be confused here. How do the other values in the FISHING_EVENT table come into play?
hs <- run_sql(db,
"SELECT *
FROM HOOK_SPECS
WHERE FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043);")
hs