Some components (especially extractors) store metadata about the table columns. For example, when a DB extractor
loads a table from the source database, it also records the physical column types from that table.
These are stored with each table column and can be used later on when working with the table. For
example, the transformation COPY
mapping allows you to set data types for the tables inside
the transformations. Also, some writers, e.g., the Snowflake writer use
the table metadata to pre-fill the table columns configuration for you.
Even if a data type is available for a column, storage always creates internally all columns for table as text not null and null-able values are converted to empty strings (except for Exasol where everything is null). Keep this in mind especially in Transformations, where the output is always cast to text. This behavior can be changed in certain cases with feature native datatypes. The non-text column type is used only during a component (transformation or writer) execution. The basic idea behind this is that a text type has the best interoperability, so this averts many issues (e.g., some date values stored in a MySQL database might not be accepted by a Snowflake database and vice-versa).
Data types from a source are mapped to a destination using a Base Type. The current base types are
STRING
, INTEGER
, NUMERIC
, FLOAT
, BOOLEAN
,
DATE
, and TIMESTAMP
. This means that, for example, a MySQL extractor
may store the value BIGINT
as a type of a column; that type maps to the INTEGER
general type. When the Snowflake writer consumes this value, it will
read the general type INTEGER
and choose a corresponding type for Snowflake, which happens to be also INTEGER
.
This logic is again designed to ensure high interoperability between the components. See the conversion table below.
You view the extracted data types in the Storage Table detail:
You can also override the data type:
When you use the table (e.g. in Snowflake writer), you’ll see the data type you configured:
Note that the column type setting is in all cases only a metadata setting. It has no effect on the actual stored data. The data is converted only at the point of writing/copying (e.g to transformation or writer). That means that you can extract an integer column, mark it as timestamp in Storage and write it as integer into a target database (though you’ll be offered to write it as timestamp).
Through the corresponding API you access both the source and base type metadata.
As described above, the source data type is converted to a base data type which is stored in metadata storage. The base type is then converted to the target data type. The following tables show mappings for each base type. The mapping
causes possible information loss (e.g. assigning SMALLINT
to INTEGER
). To minimize this, we also keep track of the data type
size and transfer that if possible. For example that a SMALLINT
column would be stored as base type INTEGER
with size 2
. If the target database supports integer sizes, you will be offered to set the type in the target database as INTEGER(2)
.
Base type STRING
represents any textual type - both CHARACTER VARYING
(or VARCHAR
) and TEXT
types are included.
Also the string base type is used for any other unrecognized type on input. That means in the following table the
source type column is not an exhaustive list. It’s a list of reasonable string types which are converted to string, all
other unknown types are converted to string as well.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | char | STRING | N/A | |
character varying | ||||
text | ||||
varchar | ||||
STRING | Hive | |||
STRING | Impala | |||
TEXT | MS SQL Server | |||
MySQL | CHAR | VARCHAR | MySQL | |
TEXT | ||||
VARCHAR | ||||
VARCHAR2 | Oracle | |||
VARCHAR | PostgreSQL | |||
Redshift | BPCHAR | VARCHAR | Redshift | |
CHAR | ||||
CHARACTER | ||||
CHARACTER VARYING | ||||
NCHAR | ||||
NVARCHAR | ||||
TEXT | ||||
VARCHAR | ||||
VARCHAR | SiSense | |||
Snowflake | BINARY | VARCHAR | Snowflake | |
CHAR | ||||
CHARACTER | ||||
STRING | ||||
TEXT | ||||
VARBINARY | ||||
VARCHAR | ||||
Synapse | BINARY | NVARCHAR | Synapse | |
CHAR | ||||
NCHAR | ||||
NVARCHAR | ||||
VARBINARY | ||||
VARCHAR | ||||
VARCHAR | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The INTEGER
base type represents data types for whole numbers.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | bigint | INTEGER | ||
bigserial | ||||
mediumint | ||||
smallint | ||||
int | ||||
int2 | ||||
int4 | ||||
int64 | ||||
int8 | ||||
integer | ||||
serial8 | ||||
tinyint | ||||
INT | Hive | |||
INT | Impala | |||
BIGINT | MS SQL | |||
MySQL | BIGINT | INTEGER | MySQL | |
INT | ||||
INTEGER | ||||
MEDIUMINT | ||||
SMALLINT | ||||
TINYINT | ||||
N/A | Oracle | |||
INTEGER | Postgres | |||
Redshift | BIGINT | INTEGER | Redshift | |
INT | ||||
INT2 | ||||
INT4 | ||||
INT8 | ||||
INTEGER | ||||
SMALLINT | ||||
BIGINT | SiSense | |||
Snowflake | BIGINT | INTEGER | Snowflake | |
BYTEINT | ||||
INT | ||||
INTEGER | ||||
SMALLINT | ||||
TINYINT | ||||
Synapse | BIGINT | INT | Synapse | |
INT | ||||
SMALLINT | ||||
TINYINT | ||||
INT | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The NUMERIC
base type represents fixed-point fractional numbers
(real
, numeric
or decimal
data types).
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | dec | NUMERIC | ||
decimal | ||||
fixed | ||||
money | ||||
number | ||||
numeric | ||||
smallmoney | ||||
DECIMAL | Hive | |||
DECIMAL | Impala | |||
DECIMAL | MS SQL Server | |||
MySQL | DEC | NUMERIC | MySQL | |
DECIMAL | ||||
FIXED | ||||
NUMERIC | ||||
NUMBER | Oracle | |||
NUMERIC | PostgreSQL | |||
Redshift | DECIMAL | NUMERIC | Redshift | |
NUMERIC | ||||
DECIMAL | SiSense | |||
Snowflake | DECIMAL | NUMBER | Snowflake | |
NUMBER | ||||
NUMERIC | ||||
Synapse | NUMERIC | NUMERIC | Synapse | |
DECIMAL | ||||
N/A | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The FLOAT
base type represents floating-point fractional numbers
(float
or double
data types).
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | binary_double | FLOAT | ||
binary_float | ||||
double | ||||
double precision | ||||
d_float | ||||
float | ||||
float4 | ||||
float8 | ||||
quad | ||||
real | ||||
FLOAT | Hive | |||
FLOAT | Impala | |||
FLOAT | MS SQL Server | |||
MySQL | DOUBLE | FLOAT | MySQL | |
DOUBLE PRECISION | ||||
FLOAT | ||||
REAL | ||||
N/A | Oracle | |||
REAL | PostgreSQL | |||
Redshift | DOUBLE PRECISION | FLOAT | Redshift | |
FLOAT | ||||
FLOAT4 | ||||
FLOAT8 | ||||
REAL | ||||
FLOAT | SiSense | |||
Snowflake | FLOAT | DOUBLE | Snowflake | |
DOUBLE PRECISION | ||||
FLOAT | ||||
FLOAT4 | ||||
FLOAT8 | ||||
REAL | ||||
Synapse | FLOAT | FLOAT | Synapse | |
REAL | ||||
FLOAT | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The BOOLEAN
base type represents a true/false values.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | bool | BOOLEAN | ||
boolean | ||||
BOOLEAN | Hive | |||
BOOLEAN | Impala | |||
BIT | MS SQL Server | |||
N/A | MySQL | |||
N/A | Oracle | |||
BOOLEAN | PostgreSQL | |||
Redshift | BOOL | BOOLEAN | Redshift | |
BOOLEAN | ||||
BIT | SiSense | |||
Snowflake | BOOLEAN | BOOLEAN | Snowflake | |
Synapse | BIT | BIT | Synapse | |
BOOL | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
The DATE
base type represents a date value without a time portion.
Source | Source Type | Base Type | Target Type | Target | |
---|---|---|---|---|---|
Generic | date | DATE | DATE | ||
N/A | Hive | ||||
N/A | Impala | ||||
DATE | MS SQL Server | ||||
MySQL | DATE | DATE | MySQL | ||
DATE | Oracle | ||||
DATE | PostgreSQL | ||||
Redshift | DATE | DATE | Redshift | ||
DATE | SiSense | ||||
Snowflake | DATE | DATE | Snowflake | ||
Synapse | DATE | DATE | Synapse | ||
DATE | Thoughtspot | ||||
Source | Source Type | Base Type | Target Type | Target |
The TIMESTAMP
base type represents a date value with a time portion.
Source | Source Type | Base Type | Target Type | Target |
---|---|---|---|---|
Generic | datetime | TIMESTAMP | ||
datetime2 | ||||
datetimeoffset | ||||
smalldatetime | ||||
timestamp | ||||
timestamptz | ||||
timestamp_LTZ | ||||
timestamp_NTZ | ||||
TIMESTAMP_TZ | ||||
timestamp with local time zone | ||||
timestamp with time zone | ||||
timestamp without time zone | ||||
TIMESTAMP | Hive | |||
TIMESTAMP | Impala | |||
DATETIME2 | MS SQL Server | |||
MySQL | DATETIME | TIMESTAMP | MySQL | |
TIMESTAMP | ||||
TIMESTAMP | Oracle | |||
TIMESTAMP | PostgreSQL | |||
Redshift | TIMESTAMP | TIMESTAMP | Redshift | |
TIMESTAMPTZ | ||||
TIMESTAMP WITH TIME ZONE | ||||
TIMESTAMP WITHOUT TIME ZONE | ||||
N/A | SiSense | |||
Snowflake | DATETIME | TIMESTAMP | Snowflake | |
TIMESTAMP | ||||
TIMESTAMP_NTZ | ||||
TIMESTAMP_LTZ | ||||
TIMESTAMP_TZ | ||||
Synapse | DATETIMEOFFSET | DATETIMEOFFSET | Synapse | |
DATETIME | ||||
DATETIME2 | ||||
SMALLDATETIME | ||||
TIME | ||||
TIMESTAMP | Thoughtspot | |||
Source | Source Type | Base Type | Target Type | Target |
As mentioned above, Keboola stores data in storage as text by default. Native datatypes feature breaks this paradigm. If this feature is enabled, data in storage are stored in columns which have datatypes as described in Keboola metadata.
There are two options how typed tables can be created
NOTE: Not all database extractors and transformations can produce basetypes.
NOTE: Some components may produce basetypes for columns and thus produce typed tables
Table with native datatypes is labeled in UI with a badge: