Skip to content

Add more data to recount3 #50

@lcolladotor

Description

@lcolladotor

This is a recurrent goal as new data is deposited nearly every day to the Sequence Read Archive.

  • To add more data to recount3, we first need computing credits at some large computing clusters such as ACCESS (formerly called XSEDE) https://access-ci.org/.

  • Next, we have to run Monorail https://github.com/langmead-lab/monorail-external to process new data.

  • The outputs are then transferred to a local cluster where we can keep a backup of the data. On the recount3 paper, this is called the aggregation node. There files across studies are aggregated.

  • The data is then uploaded to IDIES, AWS Open Data Sponsorship Program https://aws.amazon.com/marketplace/pp/prodview-t3rflz3f557jq#resources, AnVIL, or any other active mirrors. It has to follow the data structure that the recount3 R package expects.

There are additional steps that are part of the recount3 world such as:

This goal really falls outside the recount3 R package, though the R package is one of the most commonly used interfaces for the data. Accomplishing this goal will likely need its own support and/or coordination with Wilks et al and/or Razi et al

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions