This data source connector loads a single or multiple CSV files from a single or multiple AWS S3 buckets and stores them in multiple tables in Keboola Storage.
After creating a new configuration, select the files you want to extract from AWS S3 and determine how you save them to Keboola Storage. You also need to set up proper permissions on AWS.
Create a new configuration of the AWS S3 connector.
In order to access the files in S3, you need to set up AWS credentials or create an AWS role.
Select Credentials
as the Login Type. Use the AWS Access Key ID and the Secret Access Key with read permissions to the desired S3 bucket(s) and file(s).
Make sure this AWS Access Key ID has the correct permissions:
s3:GetObject
for the given key/wildcards3:ListBucket
to access all wildcard filess3:GetBucketLocation
to determine the region of the S3 bucket(s)You can add the following policy document as an inline policy to an AWS user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::mybucket/folder/*"
},
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::mybucket"
}
]
}
Select Role
as the Login Type. Create a role in your AWS account using the following steps:
147946154733
for stacks connection.keboola.com
, connection.eu-central-1.keboola.com
, connection.north-europe.azure.keboola.com
.206948715642
for all other stacks.s3:GetObject
for the given key/wildcards3:ListBucket
to access all wildcard filess3:GetBucketLocation
to determine the region of the S3 bucket(s){
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::mybucket/*"
},
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::mybucket"
}
]
}
In your project, fill in your Account ID and Role Name.
To create a new table, click the New Table button and assign it a name. It will be used to create the destination table name in Storage and can be modified.
Configured tables are stored as configuration rows. Each table has different settings (key, load type, etc.) but they all share the same AWS credentials.
For each table you have to specify an AWS S3 Bucket and a Search Key. The Search Key can be a path to a single file or a prefix to multiple files (omit the wildcard character and use the Wildcard checkbox instead).
The additional source settings section allows you to set up the following:
col_
prefix.s3_filename
is added to the table and will contain the original filename
including the relative path to the Search Key.s3_row_filename
is added to the table and will contain the row number in each
of the downloaded files.The data source connector also supports Advanced mode, all supported parameters are described in the GitHub repository.
All files stored in AWS Glacier are ignored.