IMAP Email Contents and Attachments

This extractor allows you to automatically retrieve email contents and/or it’s attachments via the IMAP protocol. It supports incremental loads and IMAP query to define specific criteria.

The IMAP protocol provides several advantages:

  • Emails stay intact in your original inbox
  • Emails can be queried using a standardized query syntax.
  • Can be used with almost any provider, so it will work with Gmail as well as with Outlook and others.

Features

Feature Note
Generic UI form Dynamic UI form
Row based configuration Allows execution of each row in parallel.
Incremental loading Allows fetching data in new increments.
IMAP query syntax Filter emails using standard IMAP query
Download email contents Full body of email downloaded into the Storage column
Download email attachments All attachments downloaded by default into a file storage.
Filter email attachments Download only attachments matching specified regex expression
Processors support Use processor to modify the outputs before saving to storage, e.g. process attachments to be stored in the Tabular Storage

Getting started

Have IMAP service enabled on your Email account. You will need the IMAP credentials (name, password) and the hostname and port information of the IMAP server.

Please refer to your email provider for more information.

Example Using GMAIL account

  • Enable and create App Password that will be specific for your integration. Name it for instance as Keboola Extractor
  • Fill in your email address in the Username field.
  • Fill in your generated App Password in the Password field.
  • Fill in the Gmail imap address: imap.gmail.com in the IMAP host field
  • Use port 993

Configuration

IMAP Settings

Fill in the Username, Password and the Hostname and Port of your providers IMAP server. See the Gmail example for inspiration.

Screenshot - Auth configuration

Row Configuration

Click the Add Row button and name the row accordingly.

Search query

Fill in a Search query to filter only the emails you want. By default all emails are downloaded. The most common usecase would be to filter the emails by the Subject and Sender, e.g. (FROM "sender-email@example.com" SUBJECT "the subject"). You can create much more complex queries if needed. Refer to the query syntax for more examples.

Screenshot - Row configuration

Period from date

Use this field to filter only emails received since the specified date. This field supports fixed dates in a format YYYY-MM-DD as well as relative date period e.g. yesterday, 1 month ago, 2 days ago, etc. We recommend setting this to cover some safety interval, for example 2 days ago when scheduled to run every day. The data is always upserted incrementally, so there won’t be any duplicates in the resulting table.

Download Content

Check this option to download email content.

Download attachments

When set to true, also the attachments will be downloaded. You may use regex pattern to filter only attachments that are matching your definition.

For example to match only pdf files you can use .+\.pdf pattern. If left empty, all attachments are downloaded.

By default, the files are downloaded into the File Storage. Use processors to control the behaviour.

Example - Processing CSV attachments

If your attachments are in csv format you can use this combination of processors to store them in the Table Storage:

  • The folder parameter of the first processor matches the resulting table name
  • The second processor defines that the result will always replace the destination table and expects header in the csv file.
  • NOTE that in this setup all attachments will be stored in the same table, so they have to share the same structure.
{
  "before": [],
  "after": [
    {
        "definition": {
          "component": "keboola.processor-move-files"
        },
        "parameters": {
          "direction": "tables",
          "folder": "result_table"
        }
      },
      {
        "definition": {
          "component": "keboola.processor-create-manifest"
        },
        "parameters": {
          "delimiter": ",",
          "enclosure": "\"",
          "incremental": false,
          "primary_key": [],
          "columns_from": "header"
        }
      }]
}

Output

Table

Single table named emails containing the email contents.

The results are always inserted incrementally to avoid duplicates.

Columns: ['pk', 'uid', 'mail_box', 'date', 'from', 'to', 'body', 'headers', 'number_of_attachments', 'size']

Attachments

Attachments are stored by default in the File Storage prefixed by the generated message pk. bb41793268d4a8710fb5ebd94eaed6bc_some_file.pdf.

The files will contain additional tags to distinguish the source:

Screenshot - Tags

Additional tags can be specified by the Create File Manifest processor or further processed and stored in the Table Storage by other processors.