S3 Storage
Kapa provides an integration to pull files from AWS S3 and other S3-compatible storage as data sources. This integration is mainly useful as a general-purpose data source if you need to give Kapa access to files from a source that isn't officially supported, or if you would like to manually control or preprocess the content you ingest to Kapa.
Prerequisites
- An S3-compatible storage bucket containing documentation files
- Access credentials with appropriate permissions
- Files in supported formats
Data ingested
When you connect Kapa to S3 Storage, the following data is ingested:
- Full text content of supported file types
- URL mappings from
index.json
(if provided)
Permissions required
The following permissions are required for your S3 credentials:
Permission | Purpose | Security considerations |
---|---|---|
List object permissions | Allows Kapa to discover files in your bucket | Read-only access to file listings |
Read object permissions | Enables Kapa to read file content | Read-only access to file content |
We recommend creating dedicated credentials with only these specific permissions for the bucket you want to connect.
Supported file formats
The S3 Storage integration currently supports:
- Markdown:
.md
files - Text:
.txt
files - Word:
.docx
files
Files in other formats are ignored.
Setup
Step 1: Create credentials for Kapa
- In your S3 provider's account management:
- For AWS: Create a new IAM user or use an existing one
- For other S3 providers: Create an API key or access credential
- Ensure the credentials have the following permissions for your bucket:
- List object permissions
- Read object permissions
- Generate and securely store the Access key ID and Secret access key
Step 2: Configure the Kapa platform
- Go to the Sources tab in the Kapa platform
- Click Add new source
- Select S3 Storage as the source type
- Enter your bucket details and S3 credentials
- Optionally configure a bucket prefix to limit the file paths to ingest
- Click Save to begin the ingestion process
Configuration options
The following configuration options are available for the S3 Storage integration:
Option | Description | Default | Required |
---|---|---|---|
Bucket name | The name of your S3 bucket | None | Yes |
Access key ID | Your S3 access key | None | Yes |
Secret access key | Your S3 secret key | None | Yes |
Bucket prefix | Optional path prefix to limit which files are ingested | None | No |
File types | Select which file types to include | All supported types | No |
Best practices
File organization
There are no strict requirements on file structure in your S3 bucket. Kapa will:
- Start looking for files at the root of the bucket, or at the specified bucket prefix
- Discover all supported files, including those in subdirectories
- Process each file according to its file extension
URL mapping
To link files in your bucket to URLs (which Kapa can reference in responses):
-
Create an
index.json
file in your bucket with the following format:[
{
"object_key": "example_file_1.md",
"source_url": "https://docs.example.com/example_file_1.md"
},
{
"object_key": "example_file_2.md",
"source_url": "https://docs.example.com/example_file_2.md"
}
]noteUse absolute filepaths in
object_key
. If you've configured a bucket prefix,object_key
path must include the full path including the prefix. -
Place this file at the root of your bucket or directly under the bucket prefix
-
When files are successfully mapped to URLs, Kapa displays the URL in its citations and when you review conversations on the Kapa platform.
The mapping file is optional, and not all files have to be represented in the index.json
.
Compatible storage services
While AWS S3 is the most common implementation, this integration works with any S3-compatible storage service, including:
- Backblaze B2
- DigitalOcean Spaces
- Google Cloud Storage
- IBM Cloud Object Storage
- Linode Object Storage
- MinIO
- Oracle Cloud Infrastructure Object Storage
- Scaleway Object Storage
- Wasabi
And others that implement the S3 protocol.
Troubleshooting
- Permission denied errors: Verify your credentials have the correct permissions for the S3 bucket
- Files not appearing: Check that your files are in supported formats and located within the specified bucket/prefix
- URL mapping not working: Ensure your index.json is properly formatted and located at the root level
- Format conversion issues: If your files are in unsupported formats, you'll need to convert them before ingestion
- Connection issues: If using a non-AWS S3 provider, you may need to provide additional configuration (contact Kapa support)