If you operate Keboola Connection in Bring Your Own Database (BYODB) mode atop your own data warehouse, the data residing in the warehouse will not automatically be visible or accessible from inside Keboola Connection. We address this by providing the External Buckets feature.
The exact implementation of external buckets depends on the database backend you use; it could be a database, schema, or another concept. Unless stated otherwise, we will describe the implementation for Snowflake.
External buckets feature is available only in the BYODB mode. To enable it, please contact support.
Public Beta Warning:
This feature is currently in public beta. Please provide feedback using the feedback button in your project.
Storage in Keboola Connection is organized into buckets. An external bucket is a special type of bucket wherein the contents of the bucket are not managed by Keboola. It can be located anywhere in the storage backend used by your Keboola Connection project (Snowflake or BigQuery) and is a virtual bucket created atop a Snowflake schema or BigQuery dataset respectively. All table-like objects (such as tables, views, and external tables) inside the schema are mapped to tables in the bucket. Access to the bucket is read-only; you cannot write to the bucket from Keboola Connection. A single schema can be registered with multiple projects in Keboola Connection simultaneously.
An external bucket can be created in the Storage section of the project.
Click on the Register External Data button. The dialog will differ based on the backend you are using.
Fill in the name of the new bucket, database name, and schema name.
Then continue to the next step, where we will provide you with a guide on how to correctly grant Keboola access to the schema in Snowflake.
Note: This set of permissions grants the Keboola service account read-only access to the data.
Once you are done, click Register Bucket, and you can start using it.
Fill in the name of the new bucket and dataset name.
Then continue to the next step, where we will provide you with a guide on how to correctly grant Keboola access to the dataset in BigQuery.
Note: By adding the Keboola service account as a subscriber, you enable read-only access to the data.
External tables are not supported, and if the shared dataset contains such tables, they will be ignored.
Once you are done, click Register Bucket, and you can start using it.
When you register an external bucket, we analyze the metadata of objects in it and register all tables and views as tables in the bucket in Keboola Connection. If you later add additional tables or views, you will need to manually refresh the external bucket using the Refresh action in the bucket detail to make them visible in Keboola Connection.
External buckets are not part of a normal input mapping. They are accessible via the read-only input mapping. Keep in mind that external buckets cannot be used in an output mapping as they are not writable.
Because external buckets are not part of a normal input mapping, they are not copied into your transformation workspace. You need to reference them in you transformation using a fully-qualified name.
In the following example, you’ve created an external bucket called users-reporting
that references the sales_emea
schema in the database REPORTING
. The schema contains a table called users
. You want to create a new table MQL_USERS
that contains only users sourced from marketing qualified leads. You can do that using the following SQL:
CREATE TABLE "MQL_USERS" AS SELECT *
FROM "REPORTING"."sales_emea"."users"
WHERE "source" = 'mql';
Note: The query uses the fully-qualified name of the table in the FROM
clause.
In BigQuery, the external bucket is mapped to an actual dataset, users-reporting
(the name you filled in the dialog), in your project – in this case, project sapi-9752
.
You can reference the contents of the dataset in your SQL transformation using a fully-qualified name.
CREATE TABLE `MQL_USERS` AS SELECT *
FROM `users-reporting`.`users`
WHERE `source` = "mql";
The dataset name is the name of bucket you provided in the dialog (users-reporting
), not that of the original dataset created in BigQuery (sales_emea
). However, there are no technical limitations; they can have the same name.
Removing an external bucket is as simple as removing any other bucket. Simply delete it in the UI or via API. The bucket will be removed from the project, but the schema in the database will remain untouched. If you wish to remove the schema, you will need to do so manually in your warehouse. You may want to revoke the grants to Keboola Connection to ensure that Keboola no longer has access to it.