Skip to content

Plugins for semantic changes tracking in dependencies #1577

@dmpetrov

Description

@dmpetrov

Problem

DVC reproduces command if dependencies were changed. Today we support many general types of dependencies:

  1. Files in major cloud storages like S3, GCS, SSH, and others like dvc run -d azure://path/to/my blob train.py ....
  2. Local data files and code through dependencies dvc run -d train.py -d images/ train.py ...

However, there are a bunch of not general dependencies which cannot be validated by DVC.

Problem examples:

  1. Tables in a database. Usually, a custom query is needed to check if data\table\objects was changed.
  2. A semantic check in a local data or code file. For example option in dvc run to specify class or method within file as dependency #1572: check if a method mycode() was changed in class MyClass in a python file train.py.

Possible solution

A custom plugin (code) might be executed to check a dependency change. A plugin could be any command which returns 0 if repro is not needed.

Solution examples:

  1. Run a script check_db.sh to validate if a table was changed and then execute the DB dump script (if it was a change). Command example: dvc -d db_dump.sh -p check_db.sh -o clients.csv run db_dump.sh clients.csv. Note, there is a new, plugin option -p.
  2. dvc -d train.py -p "python check_method_change.py MyClass.mycode change_timestamp" -d change_timestamp -o clients.csv run train.py where check_method_change.py check the code changes and returns 0 if it was a change.

UPDATE: Please note that the script check_method_change.py might be still our responsibility and we should implement it (probably outside of DVC core).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions