Skip to content

Write an archiver for grabbing pre-2001 EIA 923/920/906/867/759 data #973

@e-belfer

Description

@e-belfer

Motivation and context:

Briefly describe the dataset. What is it, and why do we want to archive it regularly?
Include a link to the dataset webpage and any metadata documentation.

It's useful to have longer timeseries generation data, and EIA has a number of forms preceding EIA 923 that we aren't currently integrating:

As a first step towards integration, we can archive these datasets in our existing EIA 923 Zenodo repository.

Requirements for archiving

To be archived on Zenodo, a dataset must be:

  • published under an open license that permits reuse and redistribution
  • less than 50Gb in size (when zipped)
  • relevant to energy modelling and research

Checklist for archive creation

Based on the README documentation on creating a new archive:

  • Add reference to EIA Forms 906, 920, 867 and 759 in the existing eia923 metadata in pudl.metadata.sources.py. See Define the dataset's metadata for a description of how to do this. This can be done at any point prior to making the final archive and will not block development.
  • Add two partitions to the existing files downloaded - `{'respondents': 'all', 'frequency':'all'}. See Implement archiver interface for more information about our archiving infrastructure.
  • Update the existing src.archivers.eia.eia923.py script to also grab the 906 files on the existing page - these files should have the partitions: {'year': [year], 'respondents':'non-utility', frequency:'all'}
  • Add the second link into the archiver, and use regex to grab the relevant files - these files should have the partitions {'year': [year], 'respondents':'utility', frequency: [annual or monthly]}, with the frequency matching the file source.
  • Test archiver locally
  • Test uploading to Zenodo
  • Manually review archive before publication.
  • Finalize archive (only core Catalyst developers can complete this step). Make sure to run the archiver with --refresh-metadata to capture the changes made to the source metadata above.
  • Automate archiving

Links to published archives:

Include a link to the published sandbox archive for review.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Icebox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions