Skip to content

Re-think the interface to start multiple runs #4

@gullmar

Description

@gullmar

An introduction: when starting new runs with the orchestrator, each run will be associated with a runName other than its usual ID. This is necessary to support some features, such as persistency.

The orchestrator allows starting multiple runs at the same time and to automatically split the input.
Let's take the start function as an example:

const orchestrator = new Orchestrator();
const client = await orchestrator.apifyClient();
const actor = client.actor('actor-id');

// Start a single run and get the run object
const run = await actor.start('my-run', { ...input }, { ...options });

// Start multiple runs and get the map [runName:runObject]
const runRecord = await actor.startRuns(
    {
        runName: 'my-run-1',
        input: { ... },
        options: { ... },
    },
    {
        runName: 'rmy-run-2',
        input: { ... },
        options: { ... },
    },
    ...
);

// Automatically split input, generate a sequence of run names,
// start multiple runs and get the map [runName:runObject]
const runRecord = await actor.startBatch(
    'my-run', // will be used as a prefix
    [...urls], // some "sources"
    (urls) => ({ startUrls: urls }), // a function mapping sources to an input object
    { ...splitRules }, // rules for generating multiple inputs
    { ...options }, // actor options
);
// You will get something like: { 'my-run-1/2': run1, 'my-run-2/2': run2 }

Even if a bit complicated, using startBatch instead of start helps to avoid API errors. E.g., instead of doing:

const run = actor.start('my-run', { startUrls: urls });

Do:

const runRecord = actor.startBatch(
    'my-run',
    urls,
    (urls) => ({ startUrls: urls }),
    { respectApifyMaxPayloadSize: true },
);

Notice that the orchestrator provides tools for working with a "run record" as if it was a single run, e.g., for reading dataset items.

I would like to simplify this interface and make it more versatile. For instance, the orchestrator could provide a separate function for splitting inputs, instead of embedding this functionality in the startBatch (and callBatch) methods. But then, what about the run names?

We could discuss how to achieve this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions