-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
I'd like to be able to run a dvc pipeline that will maintain a weekly state..
EG:
Input is:
s3:week-1.df.gz
s3:week-2.df.gz
Intermediate Output:
dvc_data/intermediate/week-1-processed.df
dvc_data/intermediate/week-2-processed.df
Final Output:
dvc_data/final/combined-formatted.dataset
So that when a new s3:week-3.df.gz appears, dvc will just run on that file, and produce:
dvc_data/intermediate/week-3-processed.df
and then updates the weeks together to produce:
dvc_data/final/combined-formatted.dataset
Extra credit if you can suck in the original version of dvc_data/final/combined-formatted.dataset and merge it with dvc_data/intermediate/week-3-processed.df
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels