Integrate using S3 Bucket
Kochava-Hosted Amazon S3 Instance
This guide outlines the steps for securely setting up data ingestion using an S3 bucket provisioned and hosted by Kochava. Follow the instructions below to successfully establish this integration.
Prerequisites:
- AWS Account: You will need access to your AWS account to manage IAM users and policies if required.
- AWS IAM Role: You have to create an AWS IAM role and share it with us. We will grant access to this AWS role that will allow you access to upload your data into our bucket.
- Access to AWS S3: Ensure your system is capable of uploading files to an S3 bucket using the appropriate credentials, either via an application or the AWS CLI.
Integration Guide:
- Kochava Provides the S3 Bucket.
Kochava will create and provide you with the details of an S3 bucket dedicated to your data uploads. This will include:- S3 bucket name.
- S3 bucket ARN.
- AWS IAM role that your AWS IAM role will be able to assume.
- AWS region.
- Configure an IAM Role.
- You will need to create an IAM role with permissions to assume a role shared by Kochava. This will grant the IAM role the required access to the S3 bucket provided by Kochava.
- The IAM role will have permissions to upload files (S3) and list files (S3) within the designated directory. Kochava will establish a trust relationship between our AWS account and yours to allow this access. We can assist you in setting up an AWS STS profile, which can be used to interact with the S3 bucket via the AWS CLI.
- For optimal security, we recommend assigning this role to a single user (If using a user) in your organization, The best recommended way is to use the role directly to avoid any credentials or access keys. Kochava will configure the role so that only one role can assume it.
- Upload Data to the Provided S3 Bucket.
Once you have configured the IAM role and received the S3 bucket details, you can begin uploading data. Here’s an example of how to upload files using the AWS CLI:- bash –
aws s3 cp your-data-file.json s3://kochava-s3-bucket-name/directory/ --profile aws-kochava-profile - Alternatively, you can use AWS SDK to trigger the upload.
- bash –
- Follow the File Naming and Schema Guidelines.
To ensure smooth data ingestion, please adhere to the following guidelines:- File Format: Ensure that your files follow the format (e.g., JSON, CSV) specified in the schema document provided by Kochava.
- Naming Convention: Use the agreed-upon naming conventions when saving files to the S3 bucket.
- Directory Structure: Organize and upload the files to the correct directory as specified by Kochava. This could include organizing by date, campaign, or other relevant criteria.
- Ensure Schema Validation.
Before uploading your data, ensure it adheres to the schema provided by Kochava. Using the correct schema format is crucial for successful data processing and ingestion.
Additional Considerations:
- Permissions: Ensure that your IAM user has the necessary permissions to assume the role shared by Kochava. You can also generate temporary credentials using
aws sts assume-role, which can be used in scripts and existing systems if you prefer not to use the CLI. - File Size: Be mindful of file size to ensure efficient data transfer and ingestion. Kochava may impose restrictions on the size of objects uploaded, depending on the bucket.
- File Type: Kochava might restrict specific file types to ensure they align with the file naming and schema specifications.
- AWS Region: Ensure that you upload data to the correct AWS region (e.g., us-east-1). Kochava AIM currently stores its data in EU-west-1 for optimal performance.
Support and Questions:
We understand that setting up this integration may require coordination and testing. If you encounter any issues or have questions, please don’t hesitate to contact us at any stage of the process.
Self-Hosted Amazon S3 Instance
This guide will provide you with a detailed and secure approach to set up S3 bucket access for data ingestion. Our process ensures both security and reliability. Below is an outline of the steps you need to follow to establish this integration.
Prerequisites:
- Amazon S3 Bucket: You should already have an S3 bucket ready for use.
- AWS Account: Confirm that you have sufficient permissions to create and modify IAM roles and policies.
Integration Guide:
- Provide Your S3 Bucket ARN.
The first step is to provide us with the Amazon Resource Name (ARN) of your S3 bucket. The ARN is a unique identifier for your bucket and is required to grant access.
How to find the S3 Bucket ARN:- Navigate to the AWS Management Console.
- Go to S3 > Your Bucket > Properties.
- Alternatively, use the AWS CLI with the following command:
bash –aws s3api get-bucket-location --bucket your-bucket-name
- We Create IAM Role.
Once you provide the S3 bucket ARN, our team will create an IAM role in our AWS account. This IAM role will securely access your S3 bucket and will have the necessary permissions required for data ingestion. - Share IAM Role ARN with You.
After creating the IAM role, we will share the IAM role ARN with you. You will need to grant this IAM role permission to access your S3 bucket.
Minimum Permission Required:
The policy should include at least the following permissions:- s3:GetObject
- s3:ListBucket
- Granting Access to the IAM Role.
Once you receive the IAM role ARN from us, follow these steps to grant access:
How to add a bucket policy:- Go to your S3 bucket in the AWS Management Console.
- Navigate to Permissions > Bucket Policy.
- Add a policy similar to the one below, replacing the placeholders with your actual bucket ARN and the IAM role ARN we provide:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<IAM_ROLE_ARN>"
},
"Action": "s3:ListBucket",
"Resource": "<BUCKET_ARN>"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "<IAM_ROLE_ARN>"
},
"Action": "s3:GetObject",
"Resource": "<BUCKET_ARN>/*"
}
]
}
Required Details:
BUCKET_ARN: ARN of your S3 bucket (provided by you).IAM_ROLE_ARN: ARN of the IAM role (provided by us).- Region: Ensure that you provide the AWS Region where your S3 bucket is located (e.g., us-east-1).
- File Path: Specify the directory path and the file name that we need to access in your S3 bucket.
- Validate Schema.
Ensure that your data follows the schema document that we provide. Adhering to this schema is critical for accurate data processing.
Additional Considerations:
- File Format: Please confirm that the files follow the agreed-upon format and naming conventions.
- Permissions: Make sure the IAM role has the correct permissions and ensure that your bucket policy is set up correctly.
- AWS Region: Verify that the bucket ARN and IAM role are within the correct AWS region.
Support and Questions:
We understand that setting up this integration may require coordination and testing. If you encounter any issues or have questions, please don’t hesitate to contact us at any stage of the process.
File and Folder Structure
Summary:
| Aspect | Method 1 | Method 2 |
|---|---|---|
| Structure Type | Nested folders by date | Flat folder |
| Scalability | High | Moderate |
| Ease of navigation | High | Moderate |
| File naming importance | Informative, but not critical | Critical (must include date/index) |
| File naming clarity | Sequential & readable (e.g., 1.csv) | Encoded with date/index |
Method 1:
Hierarchical by Year-Month folder and Date Sub Folder
This structure organizes files in a clear hierarchy based on year, month, and day. It is especially useful when dealing with larger volumes of files or when chronological clarity is essential.
File Naming Rule —
In this approach, the exact filename does not technically matter, but:
- Files should be named sequentially (e.g., 1.csv, 2.csv…) to indicate order.
- Names should be easily identifiable for human readability.
Folder Structure Example
Root Folder/
├── 2024-11/
│ └── 2024-11-22/
│ ├── 1.csv
│ └── 2.csv
├── 2024-12/
│ └── 2024-12-01/
│ ├── 1.csv
│ └── 2.csv
├── 2025-01/
│ └── 2025-01-29/
│ ├── 1.csv
│ └── 2.csv
├── 2025-02/
│ ├── 2025-02-17/
│ │ ├── 1.csv
│ │ └── 2.csv
│ └── 2025-02-18/
│ ├── 1.csv
│ └── 2.csv
├── 2025-03/
│ └── 2025-03-01/
│ ├── 1.csv
│ └── 2.csv
Benefits —
- Easy to navigate by date
- Scales well over time
- Cleaner separation between days
Method 2:
Flat Structure with Year-Month Folder and Timestamp Filenames
This structure flattens all files into a single folder with filenames that encode the date and index
File Naming Rule —
In this approach, the file name matters — it must include both the date and sequence index (e.g., 2025_05_18_2.csv) to ensure uniqueness and clarity.
Root Folder/
├── 2024-11/
│ ├── 2024_11_22_0.csv
│ └── 2024_11_22_1.csv
├── 2024-12/
│ ├── 2024_12_01_0.csv
│ └── 2024_12_01_1.csv
├── 2025-01/
│ ├── 2025_01_29_0.csv
│ └── 2025_01_29_1.csv
├── 2025-02/
│ ├── 2025_02_17_0.csv
│ ├── 2025_02_17_1.csv
│ ├── 2025_02_18_0.csv
│ ├── 2025_02_18_1.csv
│ └── 2025_02_18_2.csv
├── 2025-03/
│ ├── 2025_03_01_0.csv
│ └── 2025_03_01_1.csv
Benefits —
- Simpler directory structure
- Useful when only a small number of files are expected