Skip to content

missing sample at NCBI #2

@sformel-usgs

Description

@sformel-usgs

@McAllister-NOAA It looks like the sample_metadata files include sample E272_2B_NO20 but NCBI does not. There are also some mismatches where NCBI has "1B" as the middle part of the sample name and sample_metadata has "2B".

Here is how I checked:

# Compare sample names in sample_metadata to NCBI

library(xml2)
library(dplyr)

#Load data

#NCBI samples names; xml downloaded by hand from NCBI.
NCBI <- read_xml(x = "documentation/PRJNA982176_biosample_result.xml") %>% 
  xml_find_all(xpath = "//Id[@db_label='Sample name']") %>% 
  as_list() %>%  
  unlist() %>% 
  sort()

sample_metadata <- read.table(file = "data/sample_metadata/sample_metadata_16S.txt",
                              sep = "\t",
                              header = TRUE) %>%
  pull(Sample) %>%
  sort()

#compare samples names

NCBI
sample_metadata

which(!NCBI %in% sample_metadata)

# It looks like a lot of the mismatches are just the middle string being 1B or 2B, let's remove that

NCBI <- sub(pattern = "1B", replacement = "2B", x = NCBI)
sample_metadata <- sub(pattern = "1B", replacement = "2B", x = sample_metadata)

#Everything in NCBI is in sample_metadata
NCBI[!NCBI %in% sample_metadata] %>% sort()

#One sample is missing from NCBI
sample_metadata[!sample_metadata %in% NCBI] %>% sort()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions