When done with your configuration of the orchestration tasks, you probably want to test it. You can run the orchestration manually:
Before running the orchestration, you can review what tasks (configured components) will be executed. You can also (de)select individual tasks to exclude:
Each execution of the orchestration is shown in the Last Runs section of the orchestration:
Clicking on the job row takes you to the orchestration job details:
Here you can see which jobs were executed and which jobs failed. Notice that because the tasks within a phase run in parallel, the
Snowflake - Email recipient Index was started at the same time as the
Adform - Campaigns task
(though it is displayed after it in the list). However, the orchestration did not continue to the second phase (
because the first phase failed. You can directly retry the failed orchestration jobs by clicking Job Retry:
Only failed tasks and failed and not executed phases will be checked by default. Additional properties control the execution of tasks within orchestration:
warningstate). This feature is useful when a data source becomes temporarily unstable and you still want to try your best effort to extract it.
run, except for GoodData Writer.
When you are content with the orchestration setting, it’s time to automate its execution. This is done simply by setting the orchestration Schedule:
The orchestration schedule is set in Coordinated Universal Time (UTC) so that the orchestration always runs in a single unambiguous point in time. For clarity, the schedule is displayed in your local time. Keep in mind that other users may see different schedules. These may even differ during the year because of DST.
Before scheduling an orchestration, be sure to run it and asses a reasonable schedule. An orchestration itself is considered a component configuration which means that it will not run in parallel. When you trigger an orchestration job and there is still a previous orchestration job running (some of the configured tasks are still running), the newly created orchestration job will be waiting until the previous one finishes. This means that if you have an orchestration running for one hour, and you schedule it to run every 30 minutes, you’ll still have your tables updated only every hour. Plus you’ll also clog the project with waiting jobs.
An orchestration is designed to run unattended. That means that a new API Token is created automatically when you create an orchestration.
When you run an orchestration manually, the notifications are sent only to you (the user who triggered the orchestration) — the notification setting is ignored. When an orchestration runs unattended by the defined schedule, it runs as if the specified orchestration token triggered the execution. In that case, the notifications settings are honored. In either case, all the jobs created by the orchestration (extractors, writers, …) are run using the orchestration token. That is true even if you trigger the orchestration manually. There is no need to know or manually use the orchestration token.
Important: Do not delete, refresh or otherwise modify the orchestration token. There is a special API for that.
If you need to trigger an orchestration programmatically, create a new token just for this purpose. The token needs only permissions for the Orchestrator component:
Because the actual orchestration runs with the token stored within that orchestration, the trigger token needs no access to any buckets or tables.
Note: for historical reasons, specifying the Orchestrator component in component permissions is optional. I.e. the the token will work also if it has access to no components.
Running things in the KBC platform is designed around the concept of background jobs. One of the key properties is that the same configuration of the same component cannot run in parallel. This is primarily a safety measure to maintain consistency of the output data produced by that configuration. In a more technical way, we can say that jobs running the same configuration are serialized.
The above can be added to the basic rule of orchestrations:
Phases execute sequentially, tasks within phases execute in parallel.
Which means that
We use the word Parallel in the (usual) meaning — not serialized. Task execution is queue-based and non-deterministic and depends on other things happening in the project. When you run two jobs of different configurations at the same time, they will run in parallel. There is no certainty that they will start executing at the same time. There is no certainty that a shorter job will finish before a longer job. And if it happens, there is no certainty that it will happen every time.
This means that you must never rely on coincidental or time synchronization of jobs, even if it works sometimes. When jobs execute in parallel there is no certainty that they will execute simultaneously or in the same order, or that shorter jobs will finish before longer jobs (i.e. the jobs may not start immediately — for example, because the shorter job was already run manually).
If one task relies on the results of another task, it must always be put in another phase (be serialized). For example: It is incorrect to build an orchestration on the assumption that a 5minute task will be finished well before a 2h task, so the 2h task can use the result of the 5minute task during its execution. While this will work 99% of the time, there is no guarantee that the result of the short job will become available during the long job.
That being said, we do our best to execute jobs as quickly as possible and utilize the maximum allowed amount of parallel jobs.