Data Types

Some components (especially extractors) store metadata about the table columns. For example, when a DB extractor loads a table from the source database, it also records the physical column types from that table. These are stored with each table column and can be used later on when working with the table. For example, the transformation COPY mapping allows you to set data types for the tables inside the transformations. Also, some writers, e.g., the Snowflake writer use the table metadata to pre-fill the table columns configuration for you.

Even if a data type is available for a column, that column is always stored as text — keep this in mind especially in Transformations, where the output is always cast to text. The non-text column type is used only during a component (transformation or writer) execution. The basic idea behind this is that a text type has the best interoperability, so this averts many issues (e.g., some date values stored in a MySQL database might not be accepted by a Snowflake database and vice-versa).

Base Types

Data types from a source are mapped to a destination using a Base Type. The current base types are STRING, INTEGER, NUMERIC, FLOAT, BOOLEAN, DATE, and TIMESTAMP. This means that, for example, a MySQL extractor may store the value BIGINT as a type of a column; that type maps to the INTEGER general type. When the Snowflake writer consumes this value, it will read the general type INTEGER and choose a corresponding type for Snowflake, which happens to be also INTEGER. This logic is again designed to ensure high interoperability between the components. See the conversion table below.

You view the extracted data types in the Storage Table detail:

Screenshot - View Column Data Type

You can also override the data type:

Screenshot - Set Column Data Type

When you use the table (e.g. in Snowflake writer), you’ll see the data type you configured:

Screenshot - Set Column Data Type

Note that the column type setting is in all cases only a metadata setting. It has no effect on the actual stored data. The data is converted only at the point of writing/copying (e.g to transformation or writer). That means that you can extract an integer column, mark it as timestamp in Storage and write it as integer into a target database (though you’ll be offered to write it as timestamp).

Through the corresponding API you access both the source and base type metadata.

Data Type Conversions

As described above, the source data type is converted to a base data type which is stored in metadata storage. The base type is then converted to the target data type. The following tables show mappings for each base type. The mapping causes possible information loss (e.g. assigning SMALLINT to INTEGER). To minimize this, we also keep track of the data type size and transfer that if possible. For example that a SMALLINT column would be stored as base type INTEGER with size 2. If the target database supports integer sizes, you will be offered to set the type in the target database as INTEGER(2).

STRING

Base type STRING represents any textual type - both CHARACTER VARYING (or VARCHAR) and TEXT types are included. Also the string base type is used for any other unrecognized type on input. That means in the following table the source type column is not an exhaustive list. It’s a list of reasonable string types which are converted to string, all other unknown types are converted to string as well.

Source Source Type Base Type Target Type Target
Generic char STRING N/A
character varying
text
varchar
STRING Hive
STRING Impala
TEXT MS SQL Server
MySQL CHAR VARCHAR MySQL
TEXT
VARCHAR
VARCHAR2 Oracle
VARCHAR PostgreSQL
Redshift BPCHAR VARCHAR Redshift
CHAR
CHARACTER
CHARACTER VARYING
NCHAR
NVARCHAR
TEXT
VARCHAR
VARCHAR SiSense
Snowflake BINARY VARCHAR Snowflake
CHAR
CHARACTER
STRING
TEXT
VARBINARY
VARCHAR
Synapse BINARY VARCHAR Synapse
CHAR
NCHAR
NVARCHAR
VARBINARY
VARCHAR
VARCHAR Thoughtspot
Source Source Type Base Type Target Type Target

INTEGER

The INTEGER base type represents data types for whole numbers.

Source Source Type Base Type Target Type Target
Generic bigint INTEGER
bigserial
mediumint
smallint
int
int2
int4
int64
int8
integer
serial8
tinyint
INT Hive
INT Impala
BIGINT MS SQL
MySQL BIGINT INTEGER MySQL
INT
INTEGER
MEDIUMINT
SMALLINT
TINYINT
N/A Oracle
INTEGER Postgres
Redshift BIGINT INTEGER Redshift
INT
INT2
INT4
INT8
INTEGER
SMALLINT
BIGINT SiSense
Snowflake BIGINT INTEGER Snowflake
BYTEINT
INT
INTEGER
SMALLINT
TINYINT
Synapse BIGINT N/A Synapse
INT
SMALLINT
TINYINT
INT ThoughtSpot
Source Source Type Base Type Target Type Target

NUMERIC

The NUMERIC base type represents fixed-point fractional numbers (real, numeric or decimal data types).

Source Source Type Base Type Target Type Target
Generic dec NUMERIC
decimal
fixed
money
number
numeric
smallmoney
DECIMAL Hive
DECIMAL Impala
DECIMAL MS SQL Server
MySQL DEC NUMERIC MySQL
DECIMAL
FIXED
NUMERIC
NUMBER Oracle
NUMERIC PostgreSQL
Redshift DECIMAL NUMERIC Redshift
NUMERIC
DECIMAL SiSense
Snowflake DECIMAL NUMBER Snowflake
NUMBER
NUMERIC
N/A Thoughtspot
Source Source Type Base Type Target Type Target

FLOAT

The FLOAT base type represents floating-point fractional numbers (float or double data types).

Source Source Type Base Type Target Type Target
Generic binary_double FLOAT
binary_float
double
double precision
d_float
float
float4
float8
quad
real
FLOAT Hive
FLOAT Impala
FLOAT MS SQL Server
MySQL DOUBLE FLOAT MySQL
DOUBLE PRECISION
FLOAT
REAL
N/A Oracle
REAL PostgreSQL
Redshift DOUBLE PRECISION FLOAT Redshift
FLOAT
FLOAT4
FLOAT8
REAL
FLOAT SiSense
Snowflake FLOAT DOUBLE Snowflake
DOUBLE PRECISION
FLOAT
FLOAT4
FLOAT8
REAL
FLOAT Thoughtspot
Source Source Type Base Type Target Type Target

BOOLEAN

The BOOLEAN base type represents a true/false values.

Source Source Type Base Type Target Type Target
Generic bool BOOLEAN
boolean
BOOLEAN Hive
BOOLEAN Impala
BIT MS SQL Server
N/A MySQL
N/A Oracle
BOOLEAN PostgreSQL
Redshift BOOL BOOLEAN Redshift
BOOLEAN
BIT SiSense
Snowflake BOOLEAN BOOLEAN Snowflake
BOOL Thoughtspot
Source Source Type Base Type Target Type Target

DATE

The DATE base type represents a date value without a time portion.

Source Source Type Base Type Target Type Target
Generic date DATE DATE
N/A Hive
N/A Impala
DATE MS SQL Server
MySQL DATE DATE MySQL
DATE Oracle
DATE PostgreSQL
Redshift DATE DATE Redshift
DATE SiSense
Snowflake DATE DATE Snowflake
DATE Thoughtspot
Source Source Type Base Type Target Type Target

TIMESTAMP

The TIMESTAMP base type represents a date value with a time portion.

Source Source Type Base Type Target Type Target
Generic datetime TIMESTAMP
datetime2
datetimeoffset
smalldatetime
timestamp
timestamptz
timestamp_LTZ
timestamp_NTZ
TIMESTAMP_TZ
timestamp with local time zone
timestamp with time zone
timestamp without time zone
TIMESTAMP Hive
TIMESTAMP Impala
DATETIME2 MS SQL Server
MySQL DATETIME TIMESTAMP MySQL
TIMESTAMP
TIMESTAMP Oracle
TIMESTAMP PostgreSQL
Redshift TIMESTAMP TIMESTAMP Redshift
TIMESTAMPTZ
TIMESTAMP WITH TIME ZONE
TIMESTAMP WITHOUT TIME ZONE
N/A SiSense
Snowflake DATETIME TIMESTAMP Snowflake
TIMESTAMP
TIMESTAMP_NTZ
TIMESTAMP_LTZ
TIMESTAMP_TZ
TIMESTAMP Thoughtspot
Source Source Type Base Type Target Type Target