-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
enhancementEnhances DVCEnhances DVCp3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint
Description
Problem
DVC reproduces command if dependencies were changed. Today we support many general types of dependencies:
- Files in major cloud storages like S3, GCS, SSH, and others like
dvc run -d azure://path/to/my blob train.py .... - Local data files and code through dependencies
dvc run -d train.py -d images/ train.py ...
However, there are a bunch of not general dependencies which cannot be validated by DVC.
Problem examples:
- Tables in a database. Usually, a custom query is needed to check if data\table\objects was changed.
- A semantic check in a local data or code file. For example option in dvc run to specify class or method within file as dependency #1572: check if a method
mycode()was changed in classMyClassin a python filetrain.py.
Possible solution
A custom plugin (code) might be executed to check a dependency change. A plugin could be any command which returns 0 if repro is not needed.
Solution examples:
- Run a script
check_db.shto validate if a table was changed and then execute the DB dump script (if it was a change). Command example:dvc -d db_dump.sh -p check_db.sh -o clients.csv run db_dump.sh clients.csv. Note, there is a new, plugin option-p. dvc -d train.py -p "python check_method_change.py MyClass.mycode change_timestamp" -d change_timestamp -o clients.csv run train.pywherecheck_method_change.pycheck the code changes and returns 0 if it was a change.
UPDATE: Please note that the script check_method_change.py might be still our responsibility and we should implement it (probably outside of DVC core).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementEnhances DVCEnhances DVCp3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint