Snowflake Transformation

Snowflake has many advantages:

  • No database administration
  • No indexes, sort keys, distribution styles, or column compressions
  • Easy scaling
  • Simple data types
  • Amazing processing power and data throughput

Limits

  • Snowflake queries are limited to 900 seconds by default.
  • Queries containing comments longer than 8,192 characters will segfault.
  • Constraints (like PRIMARY KEY or UNIQUE) are defined but not enforced.

Load Type

There are two types of loading tables into your workspace. You can select either Copy Table or Clone Table.

Load Type

Copy Table is the default option, and it physically copies the table from our Storage to your workspace. Using Copy Table allows you to refine the input mapping using various filters.

Clone Table avoids physical transfer of the data and clones the table from Storage without any processing.

Clone Table

By switching Load Type to Clone Table the input mapping will utilize the Snowflake CLONE command.

As the CLONE command has no further options, all other input mapping options will be disabled (except for Source and Destination of course).

Clone Table is useful when

  • your table is very large and the Copy Table load type is slow.
  • you need more complex input mapping filters (e.g. filtering using a range).

Performance

The CLONE command will execute the input mapping almost instantly for a table of any size (typically under 10 seconds) as it physically does not move any data.

_timestamp system column

A table loaded using Clone Table will contain all columns of the original table plus a new _timestamp column. This column is used internally by Keboola Connection for comparison with the value of the Changed in last filter.

The value in the column contains a unix timestamp of the last change of the row, which is

  • when the row was added to the table, or
  • when any of the cells was modified using an incremental load.

You can use this column to set up incremental processing.

Aborting Transformation Execution

This feature is currently in public beta and is enabled for each project separately. To enable this feature, contact us using the support button in your project.

In some cases, you may need to abort the transformation execution and exit with an error message. To abort the execution, set the ABORT_TRANSFORMATION variable to any nonempty value.

SET ABORT_TRANSFORMATION = (
  SELECT 
      CASE
          WHEN COUNT = 0 THEN ''
          ELSE 'Integrity check failed'
      END
  FROM (
    SELECT COUNT(*) AS COUNT FROM INTEGRITY_CHECK WHERE RESULT = 'failed'
  )   
);

This example will set the ABORT_TRANSFORMATION variable value to 'Integrity check failed' if the INTEGRITY_CHECK table contains one or more records with the RESULT column equal to the value 'failed'.

The transformation engine checks the ABORT_TRANSFORMATION after each successfully executed query and returns the value of the variable as a user error, Transformation aborted: Integrity check failed. in this case.

Transformation aborted

Best Practices

Case Sensitivity

Unlike Redshift, Snowflake is case sensitive. All unquoted table/column names are converted to upper case while quoted names keep their case.

So if you want to create the following table,

-- creates table FOOTABLE
CREATE TABLE footable (...);

all of these commands will work

SELECT * FROM FOOTABLE;
SELECT * FROM "FOOTABLE";
SELECT * FROM footable;

while this one will not:

-- table footable not found!
SELECT * FROM "footable";

Be especially careful when setting up input and output mappings.

When writing your transformation script, quoting all table and column names is required. Snowflake converts all unquoted table/column identifiers to uppercase, which won’t match table/column identifiers created by Keboola Connection.

SELECT "barcolumn" FROM "footable";

Timestamp Columns

By default, Snowflake uses the YYYY-MM-DD HH24:MI:SS.FF3 format when converting the timestamp column to a character string.

This means that if you create a table in a transformation which uses a timestamp column,

CREATE TABLE "ts_test" AS (SELECT CURRENT_TIMESTAMP AS "ts");

the table value will come out as 2018-04-09 06:43:57.866 -0700 in Storage. If you want to output it in a different format, you have to cast the column to a string first, for example:

CREATE TABLE "out" AS
    (SELECT TO_CHAR("ts", 'YYYY-MM-DD HH:MI:SS') AS "ts" FROM "ts_test");

Do not use ALTER SESSION queries to modify the default timestamp format, as the loading and unloading sessions are separate from your transformation/sandbox session and the format may change unexpectedly.

Important: In the default US KBC region (connection.keboola.com), the following Snowflake default parameters are overridden:

Important: Snowflake works with time zones (and Daylight Savings Time), requiring you to distinguish between various conversion functions:

SELECT
    -- yields 2013-03-10 02:12:00.000 +0000
    TO_TIMESTAMP_NTZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'),
    -- yields 2013-03-10 03:12:00.000 -0700
    TO_TIMESTAMP_TZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'),
    -- yields 2013-03-10 03:12:00.000 -0700
    TO_TIMESTAMP('10.3.2013 2:12', 'DD.MM.YYYY HH:MI');