Skip to content

Figure out how to convert bcerror files to parquet #11

@jayhesselberth

Description

@jayhesselberth

TLDR; use parquet instead of CSV. At a minimum, compress bcerror CSVs with gzip before adding to the repo.

Parquet files are much more disk efficient, faster to parse, etc. Not high priority but would be useful to incorporate into remora pipelines where we just want per-base stats.

Might be as simple as:

library(readr)
library(nanoparquet)

write_parquet(read_csv("file.csv"), "file.parquet").

# then inspect to 
file.info("file.csv")
file.info("file.parquet")

# reload file in subsequent analyses
read_parquet("file.parquet")

Could also combine multiple CSVs together into one parquet with a column for sample name.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions