Database Extractors

Extractors import data from external sources and integrate it to the Keboola Connection environment.

There are a number of extractors for SQL databases:

They are all configured in the same way and have an advanced mode. (Their basic configuration is also part of our tutorial.)

On the other hand, the extractor for MongoDB, a NoSQL database, requires a different configuration. Configuration of the BigQuery extractor is also covered in another tutorial of ours.

Connecting to Database

The connection to your internal database must be well secured. If you, or your system administrator, want to avoid exposing your database server to the internet, we highly recommend setting up an SSH Tunnel for the connection.

Schema - SSH tunnel

A Secure Shell (SSH) tunnel consists of an encrypted connection created through an SSH protocol connection. You can set up this tunnel to connect to your database server located in your private network that you do not want to be accessed directly from the internet. The SSH connection is encrypted and uses a public-private key pair for user authorization.

Find detailed instructions for setting up an SSH tunnel in the developer documentation. While setting up an SSH tunnel requires some work, it is the most reliable and secure option for connecting to your database server.

Incremental Fetching

Some database extractors support the feature of Incremental Fetching. Incremental fetching allows to dramatically reduce the amount of data loaded from the source database server if the nature of the database permits. Incremental fetching can be used when the source table contains an ordering (ordinal) column and no rows are deleted (or the deletions are not important). There are two typical scenarios:

  • A table to which rows are only added and the rows have numeric and raising ID.
  • A table in which rows are added and modified and it contains a column with last modification time.

To enable incremental fetching, you have to specify the ordering column. When incremental fetching is enabled, the rows in the table are extracted in the order specified by the ordering column. The last fetched value is recorded in the configuration state. When the extraction runs again, only new rows are fetched from the sourced database. To configure incremental fetching, go to table details:

Screenshot - Configuration Detail

Then select the ordering column:

Screenshot - Incremental Fetching

The last fetched value is displayed in the configuration:

Screenshot - Incremental Fetching

The rows are fetched from the source table including the last fetched value. Therefore it is ideal to have the ordering column set as a primary key so that you don’t receive duplicated rows in the Storage table. In case you need to fetch the entire table, you can clear the stored value.

This incremental fetching feature is related to Incremental Loading. While not required, it is recommended to turn on incremental loading when fetching data incrementally, otherwise the table in Storage will contain only newly added rows. This may sound like a good idea when you want to process only the newly added rows. In that case, however, you should do so using Incremental Processing. The advantage of using incremental processing over having only newly added rows in a Storage table is that the table contains all loaded data and it is not necessary to synchronize extraction and processing.