Skip to content

Conversation

@BBeltz1
Copy link
Collaborator

@BBeltz1 BBeltz1 commented Jan 2, 2026

  • Refactor get function to accommodate new datafile
  • Update indicator and build package
  • Re-knit comparison document

@BBeltz1 BBeltz1 self-assigned this Jan 2, 2026
@atyrell3
Copy link
Contributor

atyrell3 commented Jan 6, 2026

It looks like data-raw/raw_input_chl_pp.csv isn't actually in the repo -- is it possible to add that file?

@BBeltz1
Copy link
Collaborator Author

BBeltz1 commented Jan 7, 2026

No, it exceeds Github's file size limit.

Copy link
Contributor

@jcaracappa1 jcaracappa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Will need to update for the final bit of 2025 before IR

@jcaracappa1
Copy link
Contributor

It looks like data-raw/raw_input_chl_pp.csv isn't actually in the repo -- is it possible to add that file?

If we subset just the columns that are used in ecodata its only 11MB vs 168MB
dplyr::select(PERIOD, VARIABLE, VALUE, SUBAREA, UNITS)

@BBeltz1
Copy link
Collaborator Author

BBeltz1 commented Jan 7, 2026

We could but then the raw input would essentially be a duplicate of the dataset. We would have to either manipulate the input file manually before uploading to Github or write new code elsewhere in the package that subsets the input file. This is what the get function is doing already. The only difference would be that the "input file" would be wide and ecodata::chl_pp would be long.

@BBeltz1
Copy link
Collaborator Author

BBeltz1 commented Jan 7, 2026

We could but then the raw input would essentially be a duplicate of the dataset. We would have to either manipulate the input file manually before uploading to Github or write new code elsewhere in the package that subsets the input file. This is what the get function is doing already. The only difference would be that the "input file" would be wide and ecodata::chl_pp would be long.

Actually, they would be identical except the manipulated input file would have the original column names rather than the standardized column names. Not wide vs long. Otherwise, they would be identical files.

@jcaracappa1
Copy link
Contributor

Ok then it's probably fine to keep it out in that case

@khyde
Copy link

khyde commented Jan 7, 2026

I am running the remainder 2025 data now and will let you know when it is complete. It will still be considered preliminary because the final data may not be available until February. I don't think the final data will change the narrative, but there will be another update.
Next year I will have this built into the workflow and we should discuss what information should be included in the final dataset to reduce the total size. The current version has additional products/time periods that we aren't currently using plus quite a bit of metadata, both of which can probably be slimmed down a bit.

@jcaracappa1 jcaracappa1 merged commit ac3bb0e into pre-production Jan 8, 2026
2 checks passed
@khyde
Copy link

khyde commented Jan 8, 2026

@BBeltz1
The latest preliminary chl_pp data through 2025 is now available.
/EDAB_Archive/nadata/PROJECTS/SOE_PHYTOPLANKTON/V2026/DATA_EXTRACTS/SOE_FORMAT

@BBeltz1
Copy link
Collaborator Author

BBeltz1 commented Jan 8, 2026

@khyde just noting that i've seen your message. i will process the update as soon as i can. thanks kim!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants