-
Notifications
You must be signed in to change notification settings - Fork 3
Description
galah let's you find species lists and use them to filter queries. One use case for this is finding which species have been identified in an area that match a list of threatened species.
For example, here I've returned species on the state-based NSW Threatened status species list. This list includes both species and subspecies names. As an example, there are 3 potoroo taxa listed, 1 species and 2 subspecies.
library(galah)
library(dplyr)
galah_config(email = "[email protected]", verbose = FALSE)
# NSW Threatened status species list
nsw_list <- search_all(lists, "dr650") |>
show_values()
#> • Showing values for 'dr650'.
nsw_list
#> # A tibble: 1,064 × 6
#> id name commonName scientificName lsid dataResourceUid
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 6791272 Delma impar Striped L… Delma impar http… dr650
#> 2 6790725 Callocephalon fimbri… Gang-gang… Callocephalon… http… dr650
#> 3 6790769 Cacophis harriettae White-cro… Cacophis harr… http… dr650
#> 4 6791482 Litoria booroolongen… Booroolon… Litoria booro… http… dr650
#> 5 6790526 Anthochaera phrygia Regent Ho… Anthochaera (… http… dr650
#> 6 6791456 Calidris tenuirostris Great Knot Calidris (Cal… http… dr650
#> 7 6790500 Neochmia ruficauda Star Finch Neochmia (Neo… http… dr650
#> 8 6790752 Uvidicolus sphyrurus Border Th… Uvidicolus sp… http… dr650
#> 9 6791291 Amaurornis moluccana Pale-vent… Amaurornis mo… http… dr650
#> 10 6791135 Phascogale tapoatafa Brush-tai… Phascogale ta… http… dr650
#> # ℹ 1,054 more rows
# 1 species, 2 subspecies
nsw_list |>
filter(stringr::str_detect(scientificName, "Potorous")) |>
select(id, name, commonName)
#> # A tibble: 3 × 3
#> id name commonName
#> <int> <chr> <chr>
#> 1 6791441 Potorous longipes Long-footed Potoroo
#> 2 6791265 Potorous tridactylus tridactylus Long-nosed Potoroo
#> 3 6791277 Potorous tridactylus trisulcatus Long-nosed PotorooNow I would like to return which species on this threatened species list have been seen in the last year in the Shoalhaven region. atlas_species() seems like an obvious choice because it returns a list of species. However, although atlas_species() correctly returns that a Potoroo species has been seen, it only returns the species instead of the subspecies name.
match <- galah_call() |>
galah_filter(
cl11170 == "Shoalhaven",
year == 2024,
species_list_uid == dr650) |>
atlas_species()
match
#> # A tibble: 92 × 11
#> taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
#> <chr> <chr> <chr> <chr> <chr>
#> 1 https://biodiversity.… Potorous tr… (Kerr, 1792) species Animal…
#> 2 https://biodiversity.… Haematopus … Vieillot, 1817 species Animal…
#> 3 https://biodiversity.… Haliaeetus … (Gmelin, 1788) species Animal…
#> 4 https://biodiversity.… Haematopus … Gould, 1845 species Animal…
#> 5 https://biodiversity.… Sternula al… (Pallas, 1764) species Animal…
#> 6 https://biodiversity.… Numenius (N… (Linnaeus, 1766) species Animal…
#> 7 https://biodiversity.… Calyptorhyn… (Temminck, 1807) species Animal…
#> 8 https://biodiversity.… Callocephal… (Grant, 1803) species Animal…
#> 9 https://biodiversity.… Esacus magn… Vieillot, 1818 species Animal…
#> 10 https://biodiversity.… Tyto novaeh… (Stephens, 1826) species Animal…
#> # ℹ 82 more rows
#> # ℹ abbreviated name: ¹scientific_name_authorship
#> # ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
#> # genus <chr>, vernacular_name <chr>
# only returns species name
match |>
filter(stringr::str_detect(species_name, "Potorous")) |>
select(species_name, taxon_rank)
#> # A tibble: 1 × 2
#> species_name taxon_rank
#> <chr> <chr>
#> 1 Potorous tridactylus speciesAnd we can confirm that atlas_species() returns the species name specifically because we can check the scientificName of occurrence records in Shoalhaven.
galah_call() |>
identify("Potorous") |>
galah_filter(
cl11170 == "Shoalhaven",
year == 2024,
species_list_uid == dr650) |>
group_by(scientificName) |>
atlas_counts()
#> # A tibble: 1 × 2
#> scientificName count
#> <chr> <int>
#> 1 Potorous tridactylus trisulcatus 806That atlas_species() can correctly return a species on a list was seen but not with the correct name is a little confusing. It also affects subsequent tasks we might wish to do. For example, if we have status information along with our species list nsw_list (which we can get by running show_values(all_fields = TRUE)), we won't be able to join this status information in nsw_list to match without losing information (the names match is not 1:1 for any subspecies names).
The good news is that there seems to be a solution! Grouping occurrences by their taxonConceptID returns the species and subspecies names. Going back to our Potoroo example, we can see that the correct subspecies name is returned with this method.
x <- request_data() |>
galah_filter(
cl11170 == "Shoalhaven",
year == 2024,
species_list_uid == dr650
) |>
group_by(taxonConceptID) |>
collect()
x
#> # A tibble: 93 × 11
#> taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
#> <chr> <chr> <chr> <chr> <chr>
#> 1 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
#> 2 https://biodiversity.… Haematopus … Vieillot, 1817 species Animal…
#> 3 https://biodiversity.… Haliaeetus … (Gmelin, 1788) species Animal…
#> 4 https://biodiversity.… Haematopus … Gould, 1845 species Animal…
#> 5 https://biodiversity.… Sternula al… (Pallas, 1764) species Animal…
#> 6 https://biodiversity.… Numenius (N… (Linnaeus, 1766) species Animal…
#> 7 https://biodiversity.… Calyptorhyn… (Temminck, 1807) subspecies Animal…
#> 8 https://biodiversity.… Callocephal… (Grant, 1803) species Animal…
#> 9 https://biodiversity.… Esacus magn… Vieillot, 1818 species Animal…
#> 10 https://biodiversity.… Tyto novaeh… (Stephens, 1826) species Animal…
#> # ℹ 83 more rows
#> # ℹ abbreviated name: ¹scientific_name_authorship
#> # ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
#> # genus <chr>, vernacular_name <chr>
x |>
filter(stringr::str_detect(species_name, "Potorous")) |>
select(species_name, taxon_rank)
#> # A tibble: 1 × 2
#> species_name taxon_rank
#> <chr> <chr>
#> 1 Potorous tridactylus trisulcatus subspeciesCreated on 2025-07-14 with reprex v2.1.1
Is it possible to allow atlas_species() to match to taxonConceptID so that names + taxonomic information match a species list correctly, rather than using occurrences + group_by() to get around the problem? Using atlas_species() to return and match species lists seems like the intuitive function choice for this kind of task