Skip to content

Conversation

@ilongin
Copy link
Contributor

@ilongin ilongin commented Jan 14, 2026

This PR fixes job connection (correctly setting rerun_job_id field) when re-running same script multiple times from local to Studio using datachain job run my_script.py

This was fixed by:

  • adding new field is_studio_copy boolean field to job.
  • copying newly created job in Studio to local DB (is_studio_copy is set to True). Job name is set to full script path (similar as we do when running jobs locally via python). Then on re-running we check for jobs that have same full script path as name and if it's copy from studio job and if we find job we send rerun_from_job_id field as well to Studio. Then Studio connects new job to previous one correctly and can use checkpoints etc.

@ilongin ilongin marked this pull request as draft January 14, 2026 15:28
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 14, 2026

Deploying datachain with  Cloudflare Pages  Cloudflare Pages

Latest commit: aa67b5f
Status: ✅  Deploy successful!
Preview URL: https://92104e91.datachain-2g6.pages.dev
Branch Preview URL: https://ilongin-12370-job-rerun-from-tiik.datachain-2g6.pages.dev

View logs

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datachain/data_storage/metastore.py 66.66% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@ilongin ilongin marked this pull request as ready for review January 15, 2026 13:03
parent_job_id: str | None = None,
rerun_from_job_id: str | None = None,
run_group_id: str | None = None,
is_studio_copy: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any better options for this name?

  • is_remote_job
  • is_saas_job
  • etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking a lot about naming and the reason I chose is_studio_copy is that it's by default False in both Studio and CLI .. it's only True in rare cases in CLI when we copy job from Studio. It is also very expressive / verbose and those 2 will give us the least amount of mental overhead (and less migration as well).
is_remote_job and is_saas_job will both be True in Studio and False in CLI by default which is little bit of minus IMO.. otherwise they are all pretty much similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does is_studio_copy mean it is completely only in Studio or has_saas_copy or even has_studio_copy makes more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do not copy job from Studio, we are running it remotely in Studio, this is why I am agains copy in the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense too. How about, has_studio_run or is_in_saas?

Copy link
Contributor Author

@ilongin ilongin Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dreadatour yes, but job is still created and ran in Studio. Locally we just save link / reference / copy to it.

Couple of other ideas:

  • is_studio_reference
  • is_executed_remotely

Maybe the second one since it's similar to your suggestion is_remote_job but it doesn't have "job" word in it since it's not needed

Copy link
Contributor

@dreadatour dreadatour Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! job is redundant. What do you think about simple is_remote flag? (executed is also not needed IMO)

Anyway, these options looks much better to me personally 👍🙏

script_path = os.path.abspath(query_file)

rerun_from_job_id = None
rerun_from_job = catalog.metastore.get_last_job_by_name(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we fetching it from Studio here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is it a local metastore / catalog in this case?


@skip_if_not_sqlite
def test_studio_run_connect_to_previous_job(capsys, mocker, tmp_dir, studio_token):
"""Test that job run sends rerun_from_job_id from local Studio job copy."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description is not clear (AI generated) / cleanup, make test name clear enough, remove comment, cleanup other comments and noise in the test

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need docs update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants