This extractor loads a single or multiple CSV files from AWS S3 and stores them in a table in Storage. After creating a new configuration, select files you want to extract from S3 and determine the way how you save them to KBC Storage. You also need to set up the proper permissions on AWS.
Find AWS S3 Extractor in the list of extractors and create a new configuration. Name it.
In the first part of the configuration, specify the AWS S3 bucket and the filename (key). The bucket can be in any AWS region and the key must point to a single file unless you check the Wildcard checkbox.
All files stored in Glacier will be ignored.
If the Wildcard option is turned on, all files in S3 with the defined Key prefix will be downloaded. They need to have the same header. The subfolders matching the prefix will be ignored.
The table names are predefined. However, they can be modified, or replaced entirely. You can also select an already existing table.
Incremental Load allows you to add new data to a table without truncating it. The files extracted from S3 stay the same. The only thing that changes is the way of how the data is loaded into Storage.
The primary key of an existing table cannot be modified; only new tables can set their primary keys. To change the primary key of an existing table, go to the table detail in Storage.
Use the AWS Access Key Id and Secret Access Key with read permissions to the desired bucket and file(s). Make sure that this AWS Access Key ID has the correct permissions:
s3:GetObjectfor the given key/wildcard
s3:ListBucketto access all wildcard files
s3:GetBucketLocationto determine the bucket region
You can add the following Policy Document as an Inline Policy to an AWS user:
Alternatively, you can use our AWS CloudFormation template to create a new S3 bucket and a pair of users, one of which has write permissions and the other only read-only permissions. Give the write permissions to the application storing files in CSV and the read-only permissions to the S3 extractor.
In this section you can modify the delimiter and enclosure of the CSV file. The default values are based on RFC 4180.