Weekly Processing?

I'd like to be able to run a dvc pipeline that will maintain a weekly state..


EG:

Input is:
s3:week-1.df.gz
s3:week-2.df.gz


Intermediate Output:
dvc_data/intermediate/week-1-processed.df
dvc_data/intermediate/week-2-processed.df

Final Output:
dvc_data/final/combined-formatted.dataset


So that when a new s3:week-3.df.gz appears, dvc will just run on that file, and produce:

dvc_data/intermediate/week-3-processed.df 

and then updates the  weeks together to produce:

dvc_data/final/combined-formatted.dataset

Extra credit if you can suck in the original version of dvc_data/final/combined-formatted.dataset and merge it with dvc_data/intermediate/week-3-processed.df 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly Processing? #10650

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weekly Processing? #10650

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions