Some components, especially data source connectors, store metadata about the table columns. For example, when a DB data source connector
loads a table from a source database, it also records the physical column types from that table.
These are stored with each table column and can be used later when working with the table. For
instance, transformation COPY
mapping allows you to set data types for the tables inside
the transformations. Also, some data destination connectors, e.g., the Snowflake data destination connector, use
the table metadata to pre-fill the table columns configuration for you.
Even if a data type is available for a column, Storage always internally creates all table columns as text, not null, and nullable values are converted to empty strings (except for Exasol, where everything is null). Remember this, especially in transformations, where the output is always cast to text. This behavior can sometimes be changed with the native data types feature. The non-text column type is used only during a component (transformation or data destination connector) execution. The basic idea behind this is that a text type has the best interoperability, so this averts many issues (e.g., some date values stored in a MySQL database might not be accepted by a Snowflake database and vice-versa).
Source data types are mapped to a destination using a base type. The current base types are
STRING
, INTEGER
, NUMERIC
, FLOAT
, BOOLEAN
,
DATE
, and TIMESTAMP
. This means that, for example, a MySQL data source connector
may store the value BIGINT
as a type of column; that type maps to the INTEGER
base type. When the Snowflake data destination connector consumes this value, it will
read the base type INTEGER
and choose a corresponding type for Snowflake, which happens to be also INTEGER
.
This ensures high interoperability between the components. Please take a look at the conversion table below.
View the extracted data types in the storage table detail:
You can also override the data type:
When you use the table (e.g., in the Snowflake data destination connector), you’ll see the data type you have configured:
Note that the column type setting is, in all cases, only a metadata setting. It does not affect the actual stored data. The data is converted only when writing or copying (e.g., to a transformation or a data destination connector). That means that you can extract an integer column, mark it as a timestamp in Storage and write it as an integer into a target database (though you’ll be offered to write it as a timestamp).
You access both the source and base type metadata through the corresponding API.
As described above, the source data type is converted to a base data type stored in metadata storage. The base type is then converted to the target data type. The following tables show mappings for each base type. The mapping
causes possible information loss (e.g., assigning SMALLINT
to INTEGER
). To minimize this, we also keep track of the data type
size and transfer that if possible. For example, a SMALLINT
column would be stored as base type INTEGER
with size 2
. If the target database supports integer sizes, you will be offered to set the type in the target database as INTEGER(2)
.
Base type STRING
represents any textual type; both CHARACTER VARYING
(or VARCHAR
) and TEXT
types are included.
Also, the string base type is used for any other unrecognized type on input. It means that the
source type column is not an exhaustive list in the following table. It’s a list of suitable string types converted to a string. All
other unknown types are converted to a string as well.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | char | STRING | N/A | |
character varying | ||||
text | ||||
varchar | ||||
STRING | Hive | |||
STRING | Impala | |||
TEXT | MS SQL Server | |||
MySQL | CHAR | VARCHAR | MySQL | |
TEXT | ||||
VARCHAR | ||||
VARCHAR2 | Oracle | |||
VARCHAR | PostgreSQL | |||
Redshift | BPCHAR | VARCHAR | Redshift | |
CHAR | ||||
CHARACTER | ||||
CHARACTER VARYING | ||||
NCHAR | ||||
NVARCHAR | ||||
TEXT | ||||
VARCHAR | ||||
VARCHAR | SiSense | |||
Snowflake | BINARY | VARCHAR | Snowflake | |
CHAR | ||||
CHARACTER | ||||
STRING | ||||
TEXT | ||||
VARBINARY | ||||
VARCHAR | ||||
Synapse | BINARY | NVARCHAR | Synapse | |
CHAR | ||||
NCHAR | ||||
NVARCHAR | ||||
VARBINARY | ||||
VARCHAR | ||||
VARCHAR | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The INTEGER
base type represents data types for whole numbers.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | bigint | INTEGER | ||
bigserial | ||||
mediumint | ||||
smallint | ||||
int | ||||
int2 | ||||
int4 | ||||
int64 | ||||
int8 | ||||
integer | ||||
serial8 | ||||
tinyint | ||||
INT | Hive | |||
INT | Impala | |||
BIGINT | MS SQL | |||
MySQL | BIGINT | INTEGER | MySQL | |
INT | ||||
INTEGER | ||||
MEDIUMINT | ||||
SMALLINT | ||||
TINYINT | ||||
N/A | Oracle | |||
INTEGER | Postgres | |||
Redshift | BIGINT | INTEGER | Redshift | |
INT | ||||
INT2 | ||||
INT4 | ||||
INT8 | ||||
INTEGER | ||||
SMALLINT | ||||
BIGINT | SiSense | |||
Snowflake | BIGINT | INTEGER | Snowflake | |
BYTEINT | ||||
INT | ||||
INTEGER | ||||
SMALLINT | ||||
TINYINT | ||||
Synapse | BIGINT | INT | Synapse | |
INT | ||||
SMALLINT | ||||
TINYINT | ||||
INT | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The NUMERIC
base type represents fixed-point fractional numbers
(real
, numeric
or decimal
data types).
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | dec | NUMERIC | ||
decimal | ||||
fixed | ||||
money | ||||
number | ||||
numeric | ||||
smallmoney | ||||
DECIMAL | Hive | |||
DECIMAL | Impala | |||
DECIMAL | MS SQL Server | |||
MySQL | DEC | NUMERIC | MySQL | |
DECIMAL | ||||
FIXED | ||||
NUMERIC | ||||
NUMBER | Oracle | |||
NUMERIC | PostgreSQL | |||
Redshift | DECIMAL | NUMERIC | Redshift | |
NUMERIC | ||||
DECIMAL | SiSense | |||
Snowflake | DECIMAL | NUMBER | Snowflake | |
NUMBER | ||||
NUMERIC | ||||
Synapse | NUMERIC | NUMERIC | Synapse | |
DECIMAL | ||||
N/A | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The FLOAT
base type represents floating-point fractional numbers
(float
or double
data types).
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | binary_double | FLOAT | ||
binary_float | ||||
double | ||||
double precision | ||||
d_float | ||||
float | ||||
float4 | ||||
float8 | ||||
quad | ||||
real | ||||
FLOAT | Hive | |||
FLOAT | Impala | |||
FLOAT | MS SQL Server | |||
MySQL | DOUBLE | FLOAT | MySQL | |
DOUBLE PRECISION | ||||
FLOAT | ||||
REAL | ||||
N/A | Oracle | |||
REAL | PostgreSQL | |||
Redshift | DOUBLE PRECISION | FLOAT | Redshift | |
FLOAT | ||||
FLOAT4 | ||||
FLOAT8 | ||||
REAL | ||||
FLOAT | SiSense | |||
Snowflake | FLOAT | DOUBLE | Snowflake | |
DOUBLE PRECISION | ||||
FLOAT | ||||
FLOAT4 | ||||
FLOAT8 | ||||
REAL | ||||
Synapse | FLOAT | FLOAT | Synapse | |
REAL | ||||
FLOAT | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The BOOLEAN
base type represents a true or false value.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | bool | BOOLEAN | ||
boolean | ||||
BOOLEAN | Hive | |||
BOOLEAN | Impala | |||
BIT | MS SQL Server | |||
N/A | MySQL | |||
N/A | Oracle | |||
BOOLEAN | PostgreSQL | |||
Redshift | BOOL | BOOLEAN | Redshift | |
BOOLEAN | ||||
BIT | SiSense | |||
Snowflake | BOOLEAN | BOOLEAN | Snowflake | |
Synapse | BIT | BIT | Synapse | |
BOOL | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The DATE
base type represents a date value without a time portion.
Source | Source Type | Base Type | Target Type | Target | |
---|---|---|---|---|---|
Generic | date | DATE | DATE | ||
N/A | Hive | ||||
N/A | Impala | ||||
DATE | MS SQL Server | ||||
MySQL | DATE | DATE | MySQL | ||
DATE | Oracle | ||||
DATE | PostgreSQL | ||||
Redshift | DATE | DATE | Redshift | ||
DATE | SiSense | ||||
Snowflake | DATE | DATE | Snowflake | ||
Synapse | DATE | DATE | Synapse | ||
DATE | Thoughtspot | ||||
Source | Source Type | Base Type | Target Type | Target |
The TIMESTAMP
base type represents a date value with a time portion.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | datetime | TIMESTAMP | ||
datetime2 | ||||
datetimeoffset | ||||
smalldatetime | ||||
timestamp | ||||
timestamptz | ||||
timestamp_LTZ | ||||
timestamp_NTZ | ||||
TIMESTAMP_TZ | ||||
timestamp with local time zone | ||||
timestamp with time zone | ||||
timestamp without time zone | ||||
TIMESTAMP | Hive | |||
TIMESTAMP | Impala | |||
DATETIME2 | MS SQL Server | |||
MySQL | DATETIME | TIMESTAMP | MySQL | |
TIMESTAMP | ||||
TIMESTAMP | Oracle | |||
TIMESTAMP | PostgreSQL | |||
Redshift | TIMESTAMP | TIMESTAMP | Redshift | |
TIMESTAMPTZ | ||||
TIMESTAMP WITH TIME ZONE | ||||
TIMESTAMP WITHOUT TIME ZONE | ||||
N/A | SiSense | |||
Snowflake | DATETIME | TIMESTAMP | Snowflake | |
TIMESTAMP | ||||
TIMESTAMP_NTZ | ||||
TIMESTAMP_LTZ | ||||
TIMESTAMP_TZ | ||||
Synapse | DATETIMEOFFSET | DATETIMEOFFSET | Synapse | |
DATETIME | ||||
DATETIME2 | ||||
SMALLDATETIME | ||||
TIME | ||||
TIMESTAMP | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
Specific behavior depends on the backend of your project. We’ll be using the Snowflake backend as an example.
As mentioned above, Keboola stores data in Storage as text (VARCHAR NOT NULL
) by default. With native types, data is stored in columns with an actual data type (DATETIME
, BOOLEAN
, DOUBLE
, etc.) based on Keboola metadata.
Tables with native data types are labeled in the user interface with a badge:
A table with a type definition is created using the tables-definition endpoint, and data is loaded into it. Data types used in this endpoint have to correspond with the storage backend which your project uses. Alternatively, you can use base types.
A component may provide information about column data types in its data manifest. Database data source connectors and transformations matching the storage backend (e.g., Snowflake SQL transformation on the Snowflake storage backend) will create storage tables with the same types. The database data source connectors and transformations that do NOT match the backend will create storage tables using base types.
Note: When a table is created from base types, it defaults to the lengths and precisions specific for each Storage backend. For instance, in Snowflake, the NUMBER base type is created as NUMBER(38,9), which may be unexpected if the source database column is NUMBER(10,2).
To avoid this limitation, you can manually create the table in advance using the Table Definition API with the correct precisions. When subsequent jobs write data to this table, they will respect your definition as long as it matches. Remember this when dropping and recreating tables. If a job creates a table, it will default to the incorrect type based on the base type.
For example, here’s how to create typed tables in a Snowflake SQL transformation, ensuring they are imported to Storage as typed tables:
-- create a table with datatypes
CREATE OR REPLACE TABLE "typed_table" (
"id" NUMBER,
"name" VARCHAR(255),
"created_at" TIMESTAMP_NTZ
);
-- insert some data
INSERT INTO "typed_table"
VALUES
(1, '75', '2020-01-01 00:00:00');
-- create another table with datatypes based on an existing table
CREATE OR REPLACE TABLE "typed_table_2" AS
SELECT
"id"::varchar AS "string_id",
"name"::number AS "numeric_name",
"created_at"::date AS "typed_date"
FROM
"typed_table";
Note: The data type hinting is the components’ responsibility, so components must be updated by their respective authors to support this. The database data source connectors that are maintained by Keboola already provide data types. There is no list of components that support this feature. You may check the component’s documentation to see if it supports native data types.
For example, in the case of Snowflake, you can create a column of type TIMESTAMP_NTZ
or DECIMAL(20,2)
. This allows you to specify all the data type details, for instance, including precision and scale. But it’s tied to the specific storage backend, and thus it’s not portable.
An example of such a column definition in a table-definition API endpoint call is as follows:
{
"name": "id",
"definition": {
"type": "DECIMAL",
"length": "20,2",
"nullable": false,
"default": "999"
}
}
Specifying native types using base types is ideal for component-provided types as they are storage backend agnostic. However, they can be used for the table-definition API endpoint as well. The definition is as follows:
{
"name": "id",
"basetype": "NUMERIC"
}
You can’t change the type of column of a typed table once it has been created. There are multiple ways to work around this.
First, if the table is loaded using full load, you can drop the table and create a new table with the correct types and load the data there.
If the table is loaded incrementally, you must create a new column and copy the data from the old one.
date
of type VARCHAR
in a typed table, and you want to change it to TIMESTAMP
.date_timestamp
of type TIMESTAMP
to the table.date
to date_timestamp
in the existing rows.date
is used to use date_timestamp
instead.In both cases, check all the other configurations using the table to avoid any schema mismatch. This is especially important for data destination connectors, where a table exists in the destination.
When you load data incrementally, there is a difference between typed and non-typed tables. Typed tables only compare the columns of the table’s primary key, while non-typed tables compare the whole row, updating rows where any value changes. This is described in detail in our documentation on incremental loading.
Columns without native types are always VARCHAR NOT NULL
. This means you don’t need to care about a specific NULL behavior. This changes with typed columns. In most databases, NULL does not equal NULL (NULL == NULL
is not TRUE
, but NULL
). This breaks the incremental loading flow where columns are compared against each other.
For this reason, please make sure that your primary key columns are not nullable. This is most relevant in CTAS queries, where columns are nullable by default. To work around this, specify the columns as part of the CTAS expression. For example:
CREATE TABLE "ctas_table" (
"id" NUMBER NOT NULL,
"name" VARCHAR(255) NOT NULL,
"created_at" TIMESTAMP_NTZ NOT NULL
) AS SELECT * FROM "typed_table";
For example, we have a non-typed table non_typed_table
with the following definition:
And with the following data:
To create a typed table based on non_typed_table
, create a new transformation, choose table input mapping non_typed_table
(or you can rely on read-only input mapping) and choose table output mapping typed_table
. The output table must not exist; otherwise, it will not be a typed table.
In a queries section, add an SQL query transforming column types. In this step, you should provide proper casting for your data. In the following example, you can see the custom formatting of the date.
CREATE TABLE "typed_table" AS
SELECT
CAST(ntt."id" AS VARCHAR(64)) AS "id",
CAST(ntt."id_profile" AS INTEGER) AS "id_profile",
TO_TIMESTAMP(ntt."date", 'DD.MM.YYYY"T"HH24:MI:SS') AS "date",
CAST(ntt."amount" AS INTEGER) AS "amount"
FROM "non_typed_table" AS ntt;
Run the transformation and wait until it finishes.
The newly created table typed_table
schema should look like this:
You can see the NATIVE TYPES
label after the table name, which means that the table is typed. Table columns should have the same data types as in the transformation query.
If the destination table already exists, and you want to keep the same name, you must first rename the original table (e.g., non_typed_table_bkp
). Then, create a new table using the transformation described above.
Note that incremental loading cannot be used in this case.