Skip to content

Use atlas_species() to match all taxonomic names on a species list #275

@daxkellie

Description

@daxkellie

galah let's you find species lists and use them to filter queries. One use case for this is finding which species have been identified in an area that match a list of threatened species.

For example, here I've returned species on the state-based NSW Threatened status species list. This list includes both species and subspecies names. As an example, there are 3 potoroo taxa listed, 1 species and 2 subspecies.

library(galah)
library(dplyr)

galah_config(email = "[email protected]", verbose = FALSE)

# NSW Threatened status species list
nsw_list <- search_all(lists, "dr650") |>
  show_values()
#> • Showing values for 'dr650'.

nsw_list
#> # A tibble: 1,064 × 6
#>         id name                  commonName scientificName lsid  dataResourceUid
#>      <int> <chr>                 <chr>      <chr>          <chr> <chr>          
#>  1 6791272 Delma impar           Striped L… Delma impar    http… dr650          
#>  2 6790725 Callocephalon fimbri… Gang-gang… Callocephalon… http… dr650          
#>  3 6790769 Cacophis harriettae   White-cro… Cacophis harr… http… dr650          
#>  4 6791482 Litoria booroolongen… Booroolon… Litoria booro… http… dr650          
#>  5 6790526 Anthochaera phrygia   Regent Ho… Anthochaera (… http… dr650          
#>  6 6791456 Calidris tenuirostris Great Knot Calidris (Cal… http… dr650          
#>  7 6790500 Neochmia ruficauda    Star Finch Neochmia (Neo… http… dr650          
#>  8 6790752 Uvidicolus sphyrurus  Border Th… Uvidicolus sp… http… dr650          
#>  9 6791291 Amaurornis moluccana  Pale-vent… Amaurornis mo… http… dr650          
#> 10 6791135 Phascogale tapoatafa  Brush-tai… Phascogale ta… http… dr650          
#> # ℹ 1,054 more rows

# 1 species, 2 subspecies
nsw_list |>
  filter(stringr::str_detect(scientificName, "Potorous")) |>
  select(id, name, commonName)
#> # A tibble: 3 × 3
#>        id name                             commonName         
#>     <int> <chr>                            <chr>              
#> 1 6791441 Potorous longipes                Long-footed Potoroo
#> 2 6791265 Potorous tridactylus tridactylus Long-nosed Potoroo 
#> 3 6791277 Potorous tridactylus trisulcatus Long-nosed Potoroo

Now I would like to return which species on this threatened species list have been seen in the last year in the Shoalhaven region. atlas_species() seems like an obvious choice because it returns a list of species. However, although atlas_species() correctly returns that a Potoroo species has been seen, it only returns the species instead of the subspecies name.

match <- galah_call() |>
  galah_filter(
    cl11170 == "Shoalhaven",
    year == 2024,
    species_list_uid == dr650) |>
  atlas_species()

match
#> # A tibble: 92 × 11
#>    taxon_concept_id       species_name scientific_name_auth…¹ taxon_rank kingdom
#>    <chr>                  <chr>        <chr>                  <chr>      <chr>  
#>  1 https://biodiversity.… Potorous tr… (Kerr, 1792)           species    Animal…
#>  2 https://biodiversity.… Haematopus … Vieillot, 1817         species    Animal…
#>  3 https://biodiversity.… Haliaeetus … (Gmelin, 1788)         species    Animal…
#>  4 https://biodiversity.… Haematopus … Gould, 1845            species    Animal…
#>  5 https://biodiversity.… Sternula al… (Pallas, 1764)         species    Animal…
#>  6 https://biodiversity.… Numenius (N… (Linnaeus, 1766)       species    Animal…
#>  7 https://biodiversity.… Calyptorhyn… (Temminck, 1807)       species    Animal…
#>  8 https://biodiversity.… Callocephal… (Grant, 1803)          species    Animal…
#>  9 https://biodiversity.… Esacus magn… Vieillot, 1818         species    Animal…
#> 10 https://biodiversity.… Tyto novaeh… (Stephens, 1826)       species    Animal…
#> # ℹ 82 more rows
#> # ℹ abbreviated name: ¹​scientific_name_authorship
#> # ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
#> #   genus <chr>, vernacular_name <chr>

# only returns species name
match |>
  filter(stringr::str_detect(species_name, "Potorous")) |>
  select(species_name, taxon_rank)
#> # A tibble: 1 × 2
#>   species_name         taxon_rank
#>   <chr>                <chr>     
#> 1 Potorous tridactylus species

And we can confirm that atlas_species() returns the species name specifically because we can check the scientificName of occurrence records in Shoalhaven.

galah_call() |>
  identify("Potorous") |>
  galah_filter(
    cl11170 == "Shoalhaven",
    year == 2024,
    species_list_uid == dr650) |>
  group_by(scientificName) |>
  atlas_counts()
#> # A tibble: 1 × 2
#>   scientificName                   count
#>   <chr>                            <int>
#> 1 Potorous tridactylus trisulcatus   806

That atlas_species() can correctly return a species on a list was seen but not with the correct name is a little confusing. It also affects subsequent tasks we might wish to do. For example, if we have status information along with our species list nsw_list (which we can get by running show_values(all_fields = TRUE)), we won't be able to join this status information in nsw_list to match without losing information (the names match is not 1:1 for any subspecies names).

The good news is that there seems to be a solution! Grouping occurrences by their taxonConceptID returns the species and subspecies names. Going back to our Potoroo example, we can see that the correct subspecies name is returned with this method.

x <- request_data() |>
  galah_filter(
    cl11170 == "Shoalhaven",
    year == 2024,
    species_list_uid == dr650
    ) |>
  group_by(taxonConceptID) |>
  collect()

x
#> # A tibble: 93 × 11
#>    taxon_concept_id       species_name scientific_name_auth…¹ taxon_rank kingdom
#>    <chr>                  <chr>        <chr>                  <chr>      <chr>  
#>  1 https://biodiversity.… Potorous tr… (McCoy, 1865)          subspecies Animal…
#>  2 https://biodiversity.… Haematopus … Vieillot, 1817         species    Animal…
#>  3 https://biodiversity.… Haliaeetus … (Gmelin, 1788)         species    Animal…
#>  4 https://biodiversity.… Haematopus … Gould, 1845            species    Animal…
#>  5 https://biodiversity.… Sternula al… (Pallas, 1764)         species    Animal…
#>  6 https://biodiversity.… Numenius (N… (Linnaeus, 1766)       species    Animal…
#>  7 https://biodiversity.… Calyptorhyn… (Temminck, 1807)       subspecies Animal…
#>  8 https://biodiversity.… Callocephal… (Grant, 1803)          species    Animal…
#>  9 https://biodiversity.… Esacus magn… Vieillot, 1818         species    Animal…
#> 10 https://biodiversity.… Tyto novaeh… (Stephens, 1826)       species    Animal…
#> # ℹ 83 more rows
#> # ℹ abbreviated name: ¹​scientific_name_authorship
#> # ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
#> #   genus <chr>, vernacular_name <chr>

x |>
  filter(stringr::str_detect(species_name, "Potorous")) |>
  select(species_name, taxon_rank)
#> # A tibble: 1 × 2
#>   species_name                     taxon_rank
#>   <chr>                            <chr>     
#> 1 Potorous tridactylus trisulcatus subspecies

Created on 2025-07-14 with reprex v2.1.1

Is it possible to allow atlas_species() to match to taxonConceptID so that names + taxonomic information match a species list correctly, rather than using occurrences + group_by() to get around the problem? Using atlas_species() to return and match species lists seems like the intuitive function choice for this kind of task

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions