`group_by()` not performing correctly when `taxonConceptID` and `species` are supplied

There appears to be a problem with `group_by()` in some instances. If we look at a non-authoritative species list (such as [this one](https://lists.ala.org.au/speciesListItem/list/dr29258)), we can use the 'View occurrence records' button to go to biocache. The [resulting URL](https://biocache.ala.org.au/occurrences/search?q=qid:1744761573646#tab_mapView) then contains a query ID that we can use in `galah`:

```
galah_call() |>
    filter(qid == "1744761573646") |>
    count() |>
    collect()
# A tibble: 1 × 1
   count
   <int>
1 510928
```

Something that we might be interested in is adding a `group_by()` statement to investigate which `taxonConceptID`s are associated with each `species` in the list:

```
result <- galah_call() |>
    filter(qid == "1744761573646") |>
    group_by(species, taxonConceptID) |>
    count() |>
    collect()
```

This has two problems, first that the second column name is parsed incorrectly:
```
> colnames(result)
[1] "species"                                 "taxonConceptID.https://biodiversity.org"
[3] "count"  
```

And second (more importantly) that `taxonConceptID`s are repeated across species, which is not only incorrect but should be actively impossible:

```
> result |>
+     group_by(`taxonConceptID.https://biodiversity.org`) |>
+     summarize(count = n())
# A tibble: 15 × 2
   `taxonConceptID.https://biodiversity.org`                                 count
   <chr>                                                                     <int>
 1 https://biodiversity.org.au/afd/taxa/0480b9ae-ba82-46a1-902e-fdcf4bd8e7c7     8
 2 https://biodiversity.org.au/afd/taxa/23a8017a-3a2b-4a52-8ca6-d168bf52659c     8
 3 https://biodiversity.org.au/afd/taxa/428ea60d-7f8b-401e-b63b-83910f9ef8b8     8
 4 https://biodiversity.org.au/afd/taxa/617e069f-eb5c-40ec-a027-f5fd40e5145d     8
 5 https://biodiversity.org.au/afd/taxa/645b287c-e547-4602-9275-ad3f972328bb     8
 6 https://biodiversity.org.au/afd/taxa/715a2874-1942-4762-866c-1194990e7a91     8
 7 https://biodiversity.org.au/afd/taxa/8c178318-773f-4ddf-a4c1-01967698054c     8
 8 https://biodiversity.org.au/afd/taxa/a4ef7496-ba95-481c-b3a5-a6ed66f37394     8
```

Interestingly, if we use an authoritative list that is indexed in biocache, we get the former problem but not the latter:

```
> result2 <- galah_call() |>
+     filter(species_list_uid == "dr656") |>
+     group_by(species, taxonConceptID) |>
+     count() |>
+     collect()

> colnames(result2)
[1] "species"                                 "taxonConceptID.https://biodiversity.org"
[3] "count"

> table(result2[, 2]) |>
+     max()
[1] 1 # i.e. no duplicate taxonConceptIDs
```
It is unclear at present whether the two issues are related, nor what is causing them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`group_by()` not performing correctly when `taxonConceptID` and `species` are supplied #267

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

group_by() not performing correctly when taxonConceptID and species are supplied #267

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`group_by()` not performing correctly when `taxonConceptID` and `species` are supplied #267