To configure input and output mappings in the process of creating a transformation, go to our Getting Started tutorial.
When manipulating your data, there is no need to worry about accidentally modifying the wrong tables. Mapping separates the source data from your transformation script. It creates a secure staging area with data copied from specified tables and brings only specified tables and files back to Storage. While the concept of mapping is most important for transformations, it actually applies to every single component.
There are two types of mapping that have to be set up before running a transformation:
Table names referenced by mappings are automatically quoted by Keboola Connection. This is especially important for Snowflake, which is case sensitive.
The concept of mapping helps to make the transformations repeatable and protect the project Storage. You can always rely on the following:
While this makes some operations seemingly unnecessarily complicated, it ensures that transformations are repeatable and you can’t inadvertently overwrite data in project Storage. For ad-hoc operations, we recommend you use Workspaces. For bulk operations, consider taking advantage of Variables and programmatic automation.
Depending on the transformation backend, the table input mapping process can do the following:
Depending on the transformation types you can either build your transformations working with database tables or with CSV files. Furthermore, the CSV files can be placed locally with the transformation script or they can be placed on a remote storage such as Amazon S3 or Azure Blob Storage. The supported database types are Snowflake, Redshift, and Synapse.
See an example of setting up Input Mapping in our tutorial.
.csvextension. For Database Staging, the table name is automatically quoted, which means you can use an arbitrary name. It also means that they may be case sensitive, for instance, if the destination is in Snowflake staging.
If the terms source and destination look confusing, consult the image below. We always use them in a specific context. That means input mapping destination is the same file as the source for a transformation script.
The Data types option allows you to configure settings of data types for the destination table. Data types are applicable only for destinations in database staging. Select User defined to configure data types for individual columns. The types are pre-configured with data types stored in the table metadata. The Type, Length and Nullable options define the destination table structure in the staging database.
The option Convert empty values to null adds additional processing which is particularly important for non-string columns.
All data in Storage tables are stored in string columns, and empty values (be it true
NULL values, or empty strings) are stored
as empty strings. When converting such values back to, for example,
TIMESTAMP, they have to be converted to true
NULL values to be successfully inserted.
The pre-set data types are only suggestions, you can change them to your liking. You can also use the three dots menu to bulk set
the types. Beware, however, that if a column contains values not matching the type, the entire load (and transformation) will
fail. In such case, it’s reasonable to revert to
VARCHAR types – for example, by setting Data types to None.
When working with large tables, it may become important to understand how the tables are loaded into a Snowflake staging
database. We use two loading options
CLONE. The clone copy type is a highly optimized operation which
allows loading an arbitrary large table in near-constant time. There is a limitation, however, that the clone type can’t be used
together with any filters or data type configurations.
This might present a dilemma when loading huge tables. A logical approach when trying to speed up loading a large table would
be setting data types and adding filters to copy only the necessary ones. You might find out, however, that at some point, it’s
actually faster to remove the filters and data types, take advantage of the
CLONE loading type and apply the filters
inside the transformation. Also, when you need more complex filters (filtering by multiple columns or ranges), it’s best to
remove the filter completely from input mapping, take advantage of the clone loading and do the filtering inside of the
You can verify the table loading type in the events — copy table:
CLONE mapping will execute almost instantly for a table of any size (typically under 10 seconds)
as it physically does not move any data.
A table loaded using
CLONE will contain all columns of the original table plus a new
This column is used internally by Keboola Connection for comparison with the value of the Changed in last filter.
The value in the column contains a unix timestamp of the last change of the row. You can use this column to set up incremental processing, i.e., to replace the role of the Changed in Last filter in input mapping (which you can’t use with clone mapping).
Important: The _timestamp column cannot be imported back to storage.
When you attempt to do so, you’ll get the following error:
Failed to process output mapping: Failed to load table "out.c-test.opportunity": Invalid columns: _timestamp: Only alphanumeric characters and underscores are allowed in column name. Underscore is not allowed on the beginning.
If you are not using the
_timestamp column in your transformation, you have to drop it - for example:
`ALTER TABLE "my-table" DROP COLUMN "_timestamp";`
_timestamp column is not present on tables loaded using the copy method.
The file input mapping process exports the selected files from File Storage and copies them to the staging storage. Depending on the transformation, this can be either a drive local to the transformation or a remote storage such as Amazon S3 or Azure Blob Storage. File input mapping cannot operate with Storage tables, but it comes in handy for processing files that cannot be converted to tables (binary files, malformed files, etc.) or, for example, to work with pre-trained models that you can use in a transformation.
The output mapping takes results (tables and files) from your transformation and stores them back in Storage.
It can create, overwrite, and append any table.
These tables are typically derived from the tables/files in the input mapping. In SQL transformations,
you can use any
DELETE queries to create the desired result.
The result of output mapping can be both Storage Tables and Storage Files. Table output mapping is the common first choice. The rule is that any tables or files not specified in the output mapping are considered as temporary and are deleted when the transformation job ends. You can use this to your advantage and simplify the transformation script implementation (no need to worry about cleanup or intermediary steps).
Keep in mind that every table or file specified in output mapping must be physically present in the staging area. A missing source table for the output mapping is an error. This is important when the results of a transformation are empty — you have to ensure that an empty table or an empty file (with a header or a manifest) is created.
Depending on the transformation backend, the table output mapping process can do the following:
See an example of setting up Output Mapping in our tutorial.
Important: Multiple Output Mappings can be created for your transformation. Each source table can be used only once however.
File output mapping can be useful when your transformation produces, for example, trained models that are to be used in another transformation.
Only files stored directly in the
out/files/ directory can be mapped, subdirectories are not supported
out/files/file.txt will work,