Before configuring the Orchestrator component, have all the components (data source connectors, transformations, and data destination connectors) you wish to work with configured and ready.
To configure the Orchestrator component, create a new Orchestration:
The first step is to add orchestration tasks — the component configurations you wish to run — by clicking on Configure Tasks:
Continue with New Task:
A list of configured components is shown:
After selecting a component, a list of its configurations is shown. Clicking the plus button adds the desired configuration to the orchestration:
Repeat this for all configurations you want to add into the orchestration.
Let’s assume you have the following configurations and wish to orchestrate them into a data pipeline:
Campaigns
configurationEmail recipient index
configurationCampaign Performance
and Campaign Recipient
New recipients
configurationWhen you randomly add the configurations as orchestration tasks, chances are that you’ll end up with something similar to this:
Here comes an important rule:
Orchestration Phases execute sequentially, tasks within a Phase execute in parallel.
This means that the order of phases is important and maintained and that a second phase will start only when the first phase is completely finished. On the other hand, the order of tasks within the phase is not important, they may execute in any order or in parallel. For more in-depth explanation, see the notes about Job execution.
When this rule is applied to the above task configuration, it leads to the following sequence of execution:
That means both transformations and the Mailchimp data destination connector will run in parallel, and when they finish, the Adform data source connector will be run. When it is finished, the Snowflake data source connector will run. Surely, this is not right. The data source connectors must run before the transformations and the transformations must run before the data destination connector. Because this is a typical scenario, there is a feature to do just this — Group tasks by component type:
The tasks are now reordered:
The above will lead to the following execution sequence:
First, the two data source connectors are run in parallel, then both transformations are run in parallel, and last the data destination connector sends the results to the consumer (Mailchimp service in this case). The configurations will be executed in the order in which they depend on each other.
What if the two transformations are also dependent? Let’s say that Campaign Recipient
depends on Campaign Performance
,
therefore it must be executed after it. This can be done by moving Campaign Recipient
to a new phase. Select the Campaign Recipients
task,
click Actions and Move selected tasks between phases:
Type Second Transformation Phase
to create a new orchestration phase:
The phase is created and it contains the Campaign Recipients
transformation. Now move the phase so that it executes after the phase
containing Campaign Performance
and before the phase containing the New recipients
data destination connector:
The result should be this:
Which corresponds to the following execution sequence:
That means that the Campaigns
and Email Recipient Index
configurations will execute first. When they both finish,
the transformation Campaign Performance
will run. When it finishes, the transformation Campaign Recipient
will run. Lastly, the New recipients
data destination connector will be executed.
Another way of handling dependencies is using nested orchestrations.