Before configuring the Orchestrator component, have all the components (extractors, transformations, and writers) you wish to work with configured and ready.
To configure the Orchestrator component, create a new Orchestration:
The first step is to add orchestration tasks — the component configurations you wish to run — by clicking on Configure Tasks:
Continue with New Task:
A list of configured components is shown:
After selecting a component, a list of its configurations is shown. Clicking the plus button adds the desired configuration to the orchestration:
Repeat this for all configurations you want to add into the orchestration.
Let’s assume you have the following configurations and wish to orchestrate them into a data pipeline:
Email recipient indexconfiguration
When you randomly add the configurations as orchestration tasks, chances are that you’ll end up with something similar to this:
Here comes an important rule:
Orchestration Phases execute sequentially, tasks within a Phase execute in parallel.
This means that the order of phases is important and maintained and that a second phase will start only when the first phase is completely finished. On the other hand, the order of tasks within the phase is not important, they may execute in any order or in parallel. For more in-depth explanation, see the notes about Job execution.
When this rule is applied to the above task configuration, it leads to the following sequence of execution:
That means both transformations and the Mailchimp writer will run in parallel, and when they finish, the Adform extractor will be run. When it is finished, the Snowflake extractor will run. Surely, this is not right. The extractors must run before the transformations and the transformations must run before the writer. Because this is a typical scenario, there is a feature to do just this — Group tasks by component type:
The tasks are now reordered:
The above will lead to the following execution sequence:
First, the two extractors are run in parallel, then both transformations are run in parallel, and last the writer sends the results to the consumer (Mailchimp service in this case). The configurations will be executed in the order in which they depend on each other.
What if the two transformations are also dependent? Let’s say that
Campaign Recipient depends on
therefore it must be executed after it. This can be done by moving
Campaign Recipient to a new phase. Select the
Campaign Recipients task,
click Actions and Move selected tasks between phases:
Second Transformation Phase to create a new orchestration phase:
The phase is created and it contains the
Campaign Recipients transformation. Now move the phase so that it executes after the phase
Campaign Performance and before the phase containing the
New recipients writer:
The result should be this:
Which corresponds to the following execution sequence:
That means that the
Email Recipient Index configurations will execute first. When they both finish,
Campaign Performance will run. When it finishes, the transformation
will run. Lastly, the
New recipients writer will be executed.
Another way of handling dependencies is using nested orchestrations.