Part 2 - Data Manipulation

At this point, you already know how to quickly load data into KBC, and your in.c-tutorial table bucket contains four new tables: account, opportunity, level and user. In this part of the tutorial, we will show you how to manipulate data in Storage using Transformations. Let’s create a denormalized table from the input tables and do some minor modifications to it.

Creating Transformation

To start, go to the KBC Transformations section:

Screenshot - Transformations Console

Like tables, Transformations are organized into buckets. Each transformation bucket can contain any number of individual transformations. It should represent a logical set (container) of operations you want to perform together. Before you start with transformations, create a bucket and call it Opportunity.

Screenshot - Create a Transformation Bucket

Then click on Add Transformation in the upper right corner to create an individual transformation. Make sure to enter its Name and select Backend. A backend is the engine running the transformation script; it is either a database server (MySQL, Redshift, Snowflake) or a language interpreter (R, Python). Name your transformation Denormalize opportunities, and choose the Snowflake backend.

Screenshot - Create a Transformation

When you create a transformation, you need to set up

  1. Input Mapping — what tables will be used in your transformation; tables not mentioned in Input Mapping cannot be used in the transformation.
  2. Output Mapping — what tables will be written into Storage; tables not mentioned in Output Mapping will never be modified nor permanently stored (i.e. they are temporary).
  3. Transformation Script — SQL queries defining what will happen with the data; it takes the tables from Input Mapping, modifies them and produces the tables referenced in Output Mapping.

The concept of mapping is an important safeguard when you are manipulating your data. Thanks to it, there is no way to modify the wrong tables by accident. The only tables which are modified by your transformation are those explicitly specified in Output Mapping.

Input Mapping

Let’s start with setting Input Mapping by clicking the Add Input button.

Screenshot - Add input mapping

The Source field in the input mapping refers to Storage. Select in.c-tutorial.account as the source table. You can do a full text search in the select field; typing acc will give you the table as well. In the Destination field, the table name account is automatically filled for you. This is the name of the source table inside the transformation. Create the input mapping.

Add the remaining three tables: opportunity, user and level. You will get the following configuration:

Screenshot - Input mapping result

See additional information about Input Mapping (all available options, etc.).

Output Mapping

Continue with setting up Output Mapping by clicking on the Add Output button.

Screenshot - Add output mapping

Enter opportunity_denorm into the Source field in the output mapping; the Source field refers to the transformation. This table does not exist yet. We will create it in the transformation.

The Destination field refers to the name of the output table in Storage. Start typing in and find our tutorial output bucket and add the source table name to it: out.c-tutorial.opportunity_denorm. It will create the opportunity_denorm table in the tutorial bucket in the output stage in Storage. This table does not exist either, but it will be created once the transformation runs.

After you finish Output Mapping, you will see this:

Screenshot - Output mapping result

The size of the opportunity_denorm table shows as N/A because the table does not exist yet.

See additional information about Output Mapping (all available options, etc.).

Transformation Script

To produce that table from tables account, opportunity and user, write the transformation script. To save you some time, we have already prepared the necessary SQL queries for you:

CREATE TABLE "tmp_level" AS 
    SELECT "Name", CASE 
        WHEN "Level" = 'S' THEN 'Senior'
        WHEN "Level" = 'M' THEN 'Intermediate' 
        WHEN "Level" = 'J' THEN 'Junior' END AS "Level" 
    FROM "level"; 

CREATE TABLE "tmp_opportunity" AS 
        WHEN "Probability" < 50 THEN 'Poor' 
        WHEN "Probability" < 70 THEN 'Good' 
        ELSE 'Excellent' END AS "ProbabilityClass" 
    FROM "opportunity";

CREATE TABLE "opportunity_denorm" AS 
    SELECT "tmp_opportunity".*, 
        "user"."Name" AS "UserName", "user"."Sales_Market" AS "UserSalesMarket", 
        "user"."Global_Market" AS "UserGlobalMarket", 
        "account"."Name" AS "AccountName", "account"."Region" AS "AccountRegion", 
        "account"."Status" AS "AccountStatus", "account"."FirstOrder" AS "AccountFirstOrder" 
    FROM "tmp_opportunity" 
        JOIN "user" ON "tmp_opportunity"."OwnerId" = "user"."Id" 
        JOIN "account" ON "tmp_opportunity"."AccountId" = "account"."Id" 
        JOIN "tmp_level" ON "user"."Name" = "tmp_level"."Name";

Screenshot - Transformation Queries

In the first query, we change the user level descriptions into something more clear. In the second query, we compute the quality level for each deal opportunity based on the estimated probability of closing the deal. In the third query, we denormalize all four tables into a single one. We have prepared the single table so that it will load nicely into Tableau.

Screenshot - Run Transformation

Running Transformation

Save the queries and then click on Run Transformation. This will create a background job which will

  • get the specified tables from Storage,
  • put them in a transformation database,
  • execute the queries/script, and
  • store the result in Storage again.

To see if the transformation job was successful, go to Jobs, or click on the small Transformations job has been scheduled window that pops up after a transformation starts running.

Having learned to set up a transformation, you can now