This extractor loads a single or multiple CSV files from a single or multiple AWS S3 buckets and stores them in multiple tables in Keboola Connection (KBC) Storage. Compared to the Simple AWS S3 extractor, it offers extensive CSV postprocessing and its UI gives you more flexibility.
After creating a new configuration, select the files you want to extract from AWS S3 and determine the way how you save them to KBC Storage. You also need to set up the proper permissions on AWS.
Find the AWS S3 extractor in the list of extractors and create a new configuration. Name it.
In order to access the files in S3, you need to set up AWS credentials.
Use the AWS Access Key ID and the Secret Access Key with read permissions to the desired S3 bucket(s) and file(s). Make sure this AWS Access Key ID has the correct permissions:
s3:GetObjectfor the given key/wildcard
s3:ListBucketto access all wildcard files
s3:GetBucketLocationto determine the region of the S3 bucket(s)
You can add the following policy document as an inline policy to an AWS user:
To create a new table, click the New Table button and assign a name. It will be used to create the destination table name in Storage and can be modified.
The configuration can extract as many tables as you wish. The list is fully searchable, and you can delete or disable each table. In addition, you can explicitly run extraction of only one table. The extraction order of the tables can be changed.
Each table has different settings (key, load type, etc.) but they all share the same AWS credentials.
For each table you have to specify an AWS S3 Bucket and a Search Key. The Search Key can be a path to a single file or a prefix to multiple files (omit the wildcard character and use the Wildcard checkbox instead).
The additional source settings section allows you to set up the following:
s3_filenameis added to the table and will contain the original filename including the relative path to the Search Key.
s3_row_filenameis added to the table and will contain the row number in each of the downloaded files.
For more features, switch the configuration of each table to the Power User Mode by clicking the Open JSON editor link. Through editing the full JSON configuration you can set up the component (all options are described in the GitHub repository) and also the processors (to learn more about processors, see the Developers Docs).
Changing the JSON configuration may render the visual form unable to represent the configuration, and switching back may be disabled. Reverting such changes will re-enable the visual form. But whenever possible, the JSON will translate back to the visual form and vice versa.
All files stored in AWS Glacier are ignored.