In the Keboola Connection platform, most of the data processing functions are implemented in Components. Components are divided into three main categories:
All components, regardless of their category, behave the same way. To use a component, you have to create a configuration first. A configuration is used to set the necessary parameters for each component (e.g., credentials and other specification of what to do). Then it can be run — a Job is created and does the actual work.
To create a new component configuration, select Components from the top navigation and then select one of the component categories (extractor, writer, or application):
The following page shows a list of the currently existing configurations (extractors in this example) in the project. To create a new configuration of a new component, use the Directory or the Add New Extractor button.
Use the search field to find the component you want to use and click the component tile to add a new component (Currency Rates extractor in this case):
The following page describes in detail what the component does and allows you to create a new configuration using the New Configuration button.
In the dialog, enter a name for the configuration. If a component has a single configuration in your project, the name is not that important. However, for components with multiple configurations (e.g., configured with different credentials) the names should meaningfully distinguish one configuration from another.
The next page shows a form for configuring the component, this varies heavily between different components. Component configuration can range from trivial (as in the case of the Currency extractor) to very complex ones (e.g., Adwords extractor). The configuration complexity badge shown in the component list gives you a rough idea of what to expect.
When you set the parameters and Save them, you can actually run the component using the Run Component button:
When you run a component, a Job is created and subsequently executed. The right panel shows the last executed jobs with an indication of their status:
You can click each job to view its details, including the tables it reads from your project and the tables it produced in your project. When running the configuration, its active version (the one with the green tick-mark) will be used.
Configuration parameters can be changed at any time. Every change to a configuration is recorded in the history of versions. You can also change the name and description of the configuration at any time. The configuration description supports rich text formatting using Markdown.
The bottom right panel shows a list of the configuration versions. To see their full list, use the links. The version list is complete and allows you to compare adjacent versions or revert to any previous version. The bottom right panel shows list of the configuration versions. You can use the links to see full list of the configuration versions. Use the version list to:
Important: Component configurations do not count towards your project quota.
The version list is unlimited. Configuration versions are also created when the configurations are manipulated programmatically using the API. In other words, there is no way to modify a configuration without the changes being recorded.
You can compare adjacent version using the compare button:
When you compare two versions, a difference of the raw JSON configurations is shown.
When you rollback a configuration, a new version is created. This means that you never lose any version of a configuration and there is always an option to get back to it. Configuration versions are also created when the configurations are manipulated programmatically using the API.
If you need to return to an older version of the configuration, you can also rollback to it (the other option is to make its copy). Rolling back a configuration version actually means that a new configuration version is created (and marked as active) with the contents of the selected version. Rollback is therefore quite a safe operation.
Click the Rollback button next to the version you want to return to:
Confirm the rollback and see the result:
You can also use the version list to create a Copy of the configuration:
You can customize the name of the configuration copy:
The copy of the configuration is created as a new isolated configuration – i.e., there is no link between the original configuration and the copy, and the changes to one have no effect on the other. The new configuration is completely independent on the old one. You may modify or delete either of them without affecting the other one.
To delete a configuration, click the bin icon in the configuration list or the Move to Trash button in the configuration detail. They both move the configuration to Trash.
Each configuration moved to Trash acts as deleted: it is removed from orchestrations, cannot be run and is not displayed. You can undo the delete operation immediately after the delete operation, or restore the configuration from Trash accessible from the main menu.
There you can restore a deleted configuration, or permanently remove it. Once deleted from Trash, no configuration can be recovered. If your Trash is filled with a large number of configurations and you want to quickly find the one you need to restore or permanently remove, use the filter and search options in the upper part of the page.
When you restore a configuration, its new version is created. Therefore you can see the complete history in the configuration versions:
For technical reasons, configurations of the Orchestrator component cannot be restored when deleted. They will still be shown in Trash, but cannot be restored:
Some components support the concept of multiple configurations sharing some parameters in common. A typical example are database extractors, where multiple tables are extracted and they all share the same database credentials. In such components, the configuration itself contains only the credentials and tables are stored in configuration rows. The configuration acts as an envelope for the configuration rows.
The following example shows a configuration of AWS S3 extractor with three configuration rows:
Each row can individually be modified or deleted. You can also disable a row which means that if the entire configuration is ran, the row will be skipped. You can also run a single row explicitly. Jobs which run only a single row have the label partial. Therefore, you can for example create a configuration which on a scheduled run extracts all the enable tables and also contains some tables, which share the same credentials but are updated manually (or in a different orchestration).
You can add as many rows as you like, the list of configuration rows is fully searchable. You can also change the order of the rows. The order is maintained during processing, so you can use this to for example extract the large tables first. The order of rows has no effect on your project, because a Job is finished only after each row has been processed.
Changes to configuration rows are part of the configuration versioning. The following image shows that the versions in th configuration page list changes to the configuration rows – both that a table was added and that it was modified.
That means that each configuration version contains a complete set of its rows. This is important when copying or rolling back a version – you can do these operations safely without worrying about rows.
When you edit a configuration row, there is a also a list of row versions. Row versions show changes only to the single row. You can rollback a row to a previous version without interacting with the other rows.
Note that you cannot copy a configuration row to a new configuration. You always have to copy the entire configuration.
Many services support authorization using the OAuth protocol. For you (as the end user) it means that the service does not require entering credentials (username, password, token, etc.). Instead you are redirected to the service itself where you authorize the Keboola Connection component. Then you are redirected back to Keboola Connection and you can set other parameters and run the configuration.
The OAuth authorization process begins with the Authorize button (in this example the Google calendar extractor is shown):
In the next step, you can choose authorization method:
OAuth authorization is a very secure authorization method in which you don’t have to hand over the credentials to your account. The consumer – Keboola Connection component – obtains only the minimal required access. The authorization is only valid for the configuration in which it was created and for its copies.
Instant authorization requires that you are can actually login to the service you want to use. With instant authorization, you have to enter a name describing the account, you’re going to use:
Next, you are taken to the actual authorization screen. This is provided by the service itself and varies a lot. In the example below, the authorization for Google calendar is shown.
External account can be used in cases where you do not have direct access to the service and the service account owner can’t share the credentials with you. For example they can belong to a different company or department, or sharing the credentials would leak some permissions. In such cases you can ask the account owner to authorize the component configuration for you.
The first step is to generate the authorization link:
Then you can copy the link and send it to the account owner. The link is valid for 48 hours, if it expires you have to regenerate it and resend it to the account owner.
When the account owner clicks the link they’re taken to a special page. There they can confirm what are they authorizing and who requested the action, then they can click on the Authorize Account to proceed:
In the next step, the account owner has to enter the account name:
Then they are taken to the actual authorization screen. This is provided by the service itself and varies a lot. In the example below, the authorization for Google calendar is shown.
You can reset an existing authorization using the Reset button.
This invalidates the authorization obtained from the service and allows you to reauthorize the configuration. Note that if you reset the authorization of a configuration which was copied, all the copies will lose the authorization. The authorization can also be revoked from within the service, in that case the configuration will stop working and you have to reset and reauthorize the configuration.
For more features, switch the configuration of each table to the Power User Mode by clicking the JSON editor link. Through editing the full JSON configuration you can set up the component (all options are described in the component repository) and also the processors (to learn more about processors, see the Developers Docs).
Changing the JSON configuration may render the visual form unable to represent the configuration, and switching back may be disabled. Reverting such changes will re-enable the visual form. But whenever possible, the JSON will translate back to the visual form and vice versa.