Each extractor from an SQL database allows you to extract data from selected tables, or results from arbitrary SQL queries.
The extractors for supported SQL databases (Cloudera Impala, Firebird, IBM DB2, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, Teradata) are configured in the same way and have an advanced mode. All notable differences are listed in the section Server Specific Notes.
Before you start configuring your SQL extractor, consider securing your connection to your internal database to avoid exposing your database server to the internet by setting up an SSH Tunnel.
Note: Quick introduction to extracting data from the Snowflake Database Server is also part of our tutorial.
Find the extractor for the database of your choice in the list of extractors and create a new configuration. Name it (you can change the name any time).
The first step is to configure database credentials using the Setup database credentials button:
Fill in the credentials to the database. See section Server Specific Notes for description of non-standard fields. After testing the credentials, Save them:
After saving the credentials, the extractor will automatically fetch the list of database tables accessible by the provided credentials. Select the tables you want to extract and click the Create button:
The selected tables are configured. You can use the Run Extraction to load data from the server into Storage. You can also modify the configured tables by clicking on the appropriate row, or add new tables via the New Table button. Each table may be also extracted individually, or it may be disabled so that it is not extracted when the entire configuration is run. Existing credentials can be changed using the Database Credentials link.
If you want to modify table extraction setup, click on the corresponding row. You’ll get to the table detail view:
Here you can modify the source table, limit the extraction to specific columns or change the destination table name in Storage. The table details also allows you to define Primary Key and Incremental Loading. We highly recommend you define a primary key where possible. Primary keys substantially speed up both the data loads and further processing of the table. Also use incremental loading when possible — again that speeds up the data loads considerably. Both options require knowledge of the source table, so don’t turn them on blindly.
The table detail also allows you to switch to Advanced mode:
In advanced mode, you can write an arbitrary
SELECT query, the result of that query will be imported to a
Storage table. The SQL query is executed on the source server without any processing, that means that
you have to follow the SQL dialect of the particular server you’re working with.
Please keep the following in mind when using the advanced mode:
Avoid doing complex joins and aggregations in SQL queries. Remember that these queries are executed on the database server you are extracting from. This database system might not be designed or optimized for complex SELECT queries. Complex queries may result in timeouts, or they might produce unnecessary loads on your internal systems. Instead, import raw data, and then use KBC tools to give it the shape you want.
The MySQL database server also supports encrypting the whole database communication using SSL Certificates. See the official guide for instructions on setting it up.
The SQL Server export uses the BCP utility to export data.
For this reason, if you are writing advanced mode queries you have to quote the values of non-numeric columns (text, datetime, etc.) — so that the selected
"some text" instead of
some text. This can be done by e.g.
SELECT char(34) + [my_text_column] + char(34).
CHAR function with argument
the double quote character
When the extracted text itself may contain quotes, you need to escape them by replacing
"". Full example:
The extractor will still work if you don’t do these things, but the BCP will fail and the backup, much slower method
will be used. In that case the message
BCP command failed: ... Attempting export using pdo_sqlsrv will be logged in the extraction
An SQL Server instance hosted on Azure will normally have a host name such as
If the hostname is provided as IP address e.g.
22.214.171.124 the username needs to have the suffix
@[srvName] as in, for example,
When extracting data from a Snowflake database, the permissions must be set to allow the specified user to use the specified warehouse.
The following SQL code creates the role and user
KEBOOLA_SNOWFLAKE_EXTRACTOR and grants them access
MY_DATA database, and
Note that with the above setup, the user will not have access to newly created tables. You will either have to use a more permissive role or reset the permissions by calling: