Tasks

Add Tasks
Organize Tasks
Handling Dependencies

Before configuring the Orchestrator component, have all the components (data source connectors, transformations, and data destination connectors) you wish to work with configured and ready.

To configure the Orchestrator component, create a new Orchestration:

Add Tasks

The first step is to add orchestration tasks — the component configurations you wish to run — by clicking on Configure Tasks:

Continue with New Task:

A list of configured components is shown:

After selecting a component, a list of its configurations is shown. Clicking the plus button adds the desired configuration to the orchestration:

Repeat this for all configurations you want to add into the orchestration.

Organize Tasks

Let’s assume you have the following configurations and wish to orchestrate them into a data pipeline:

Adform data source connector with the Campaigns configuration
Snowflake data source connector with the Email recipient index configuration
Transformations with the configurations Campaign Performance and Campaign Recipient
Mailchimp data destination connector with the New recipients configuration

When you randomly add the configurations as orchestration tasks, chances are that you’ll end up with something similar to this:

Here comes an important rule:

Orchestration Phases execute sequentially, tasks within a Phase execute in parallel.

This means that the order of phases is important and maintained and that a second phase will start only when the first phase is completely finished. On the other hand, the order of tasks within the phase is not important, they may execute in any order or in parallel. For more in-depth explanation, see the notes about Job execution.

When this rule is applied to the above task configuration, it leads to the following sequence of execution:

That means both transformations and the Mailchimp data destination connector will run in parallel, and when they finish, the Adform data source connector will be run. When it is finished, the Snowflake data source connector will run. Surely, this is not right. The data source connectors must run before the transformations and the transformations must run before the data destination connector. Because this is a typical scenario, there is a feature to do just this — Group tasks by component type:

The tasks are now reordered:

The above will lead to the following execution sequence:

First, the two data source connectors are run in parallel, then both transformations are run in parallel, and last the data destination connector sends the results to the consumer (Mailchimp service in this case). The configurations will be executed in the order in which they depend on each other.

Handling Dependencies

What if the two transformations are also dependent? Let’s say that Campaign Recipient depends on Campaign Performance, therefore it must be executed after it. This can be done by moving Campaign Recipient to a new phase. Select the Campaign Recipients task, click Actions and Move selected tasks between phases:

Type Second Transformation Phase to create a new orchestration phase:

The phase is created and it contains the Campaign Recipients transformation. Now move the phase so that it executes after the phase containing Campaign Performance and before the phase containing the New recipients data destination connector:

The result should be this:

Which corresponds to the following execution sequence:

That means that the Campaigns and Email Recipient Index configurations will execute first. When they both finish, the transformation Campaign Performance will run. When it finishes, the transformation Campaign Recipient will run. Lastly, the New recipients data destination connector will be executed.

Another way of handling dependencies is using nested orchestrations.