Skip to content

discrepancy in data from get_ll_hook_data() and get_survey_sets() #12

@jdunic

Description

@jdunic

There seems to be a discrepancy in the data coming from gfdata::get_survey_sets() and gfdata::get_ll_hook_data(). From get_ll_hook_data(), it looks like count_target_species and count_non_target_species are not always correct. And I am unclear of where bait counts come from for these fishing_event_id values that correspond with these incongruent values.

Setup data to combine hook counts (baited, empty, total) with survey data

library(dplyr)

ye_survey <- gfdata::get_survey_sets(species = "yelloweye rockfish", ssid = c(39, 40))

ye_hook_dat <- 
  gfdata::get_ll_hook_data(species = "yelloweye rockfish", ssid = c(39, 40)) |>
    mutate(total_hooks = count_target_species + count_non_target_species +
            count_bait_only + count_empty_hooks - count_bent_broken)

dat <- left_join(ye_hook_dat, ye_survey)

Expected results

1. Number of rows from get_survey_sets() and get_ll_hook_data() should be equal

2. total_hooks should equal hook_count

3. count_target_species should equal catch_count

Current results

  1. More rows are returned from get_ll_hook_data() than get_survey_sets().

From what I can tell, I think that this is because of a difference in checking for usable sets in get_ll_hook_data.

nrow(ye_survey) - nrow(ye_hook_dat)
dat |> filter(is.na(catch_count))

db <- "GFBioSQL"

# check first 5 fishing_event_id values that are extra in ye_hook_dat
use_check <- gfdata::run_sql(db, 
  "SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID, 
    C.SPECIES_CODE, C.CATCH_COUNT, 
    LS.LGLSP_HOOK_COUNT, LS.LGLSP_HOOKS_SET_COUNT, LS.LGLSP_HOOKS_LOST_COUNT, LS.USABILITY_CODE,
    U.USABILITY_DESC
   FROM FISHING_EVENT_CATCH FEC
       INNER JOIN CATCH C ON
       FEC.CATCH_ID = C.CATCH_ID
       INNER JOIN LONGLINE_SPECS LS ON
       FEC.FISHING_EVENT_ID = LS.FISHING_EVENT_ID
       INNER JOIN USABILITY U ON
       LS.USABILITY_CODE = U.USABILITY_CODE
       LEFT JOIN HOOK_SPECS HS ON
       FEC.FISHING_EVENT_ID = HS.FISHING_EVENT_ID
   WHERE FEC.FISHING_EVENT_ID IN (309481, 309483, 309514, 309516, 309519, 309521);")

# Includes usability codes not in (0, 1, 2, 6)
glimpse(use_check)

**2. total_hooks does not equal hook_count (from get_survey_sets())in some cases. hook_count looks like the correct value. **

3. catch_count which comes from get_survey_sets() looks like the correct value. While catch information from get_ll_hook_data() looks wrong.

Some examples of 2 and 3:

  • fishing_event_id == 2193813: count of target species (count_non_target_species) is 0, but should be 3
  • fishing_event_id == 309482: count of target species is 15 but should be 16
  • fishing_event_id == 1722030: count of target species is 19 but should be 18
  • fishing_event_id == 1722028: count of target species is 2 but should be 1
  • fishing_event_id == 4590043: count of non_target is 0, but 141 dogfish were caught, which looks like why total_hooks is incorrect
count_diffs <- dat |>
  drop_na(catch_count) |> # ignore the usability code issue
  mutate(c_diff = count_target_species - catch_count, 
         h_diff = total_hooks - hook_count) |>
  select(fishing_event_id, count_non_target_species, count_target_species, 
         count_bait_only, count_empty_hooks, count_bent_broken, total_hooks, 
         fishing_event_id, catch_count, hook_count, 
         c_diff, h_diff) |> # simplify table view
  rename(non_target = 'count_non_target_species', target = 'count_target_species', 
         bait = 'count_bait_only', empty = 'count_empty_hooks', 
         bent = 'count_bent_broken') # simplify table view

count_diffs |>
  arrange(-abs(c_diff), -abs(h_diff)) |> # show big diffs at top of table
  slice(1:5) |>
  View()

catch_dat <- gfdata::run_sql(db, 
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID, 
    C.SPECIES_CODE, C.CATCH_COUNT, C.CATCH_WEIGHT, 
    S.SPECIES_COMMON_NAME
 FROM FISHING_EVENT_CATCH FEC
 INNER JOIN CATCH C ON
    FEC.CATCH_ID = C.CATCH_ID
 LEFT OUTER JOIN SPECIES S ON
    C.SPECIES_CODE = S.SPECIES_CODE
 WHERE FEC.FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043)")

View(catch_dat)

4. When confirming that the bait and empty counts are correct, I cannot find the fishing_event_ids from the example above in HOOK_SPECS, so where is the count_bait_only coming from in get_ll_hook_data()? I might just be confused here. How do the other values in the FISHING_EVENT table come into play?

hs <- run_sql(db, 
"SELECT *
 FROM HOOK_SPECS 
 WHERE FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043);")
hs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions