S3 Storage

Kapa provides an integration to pull files from AWS S3 and other S3-compatible storage as data sources. This integration is mainly useful as a general-purpose data source if you need to give Kapa access to files from a source that isn't officially supported, or if you would like to manually control or preprocess the content you ingest to Kapa.

Prerequisites

An S3-compatible storage bucket containing documentation files
Access credentials with appropriate permissions
Files in supported formats

Data ingested

When you connect Kapa to S3 Storage, the following data is ingested:

Full text content of supported file types
URL mappings from index.json (if provided)

Permissions required

The following permissions are required for your S3 credentials:

Permission	Purpose	Security considerations
List object permissions	Allows Kapa to discover files in your bucket	Read-only access to file listings
Read object permissions	Enables Kapa to read file content	Read-only access to file content

We recommend creating dedicated credentials with only these specific permissions for the bucket you want to connect.

Supported file formats

The S3 Storage integration currently supports:

Markdown: .md files
Text: .txt files
Word: .docx files
PDF: .pdf files

Files in other formats are ignored. Note that PDF processing takes longer than other document types.

Setup

Step 1: Create credentials for Kapa

In your S3 provider's account management:
- For AWS: Create a new IAM user or use an existing one
- For other S3 providers: Create an API key or access credential
Ensure the credentials have the following permissions for your bucket:
- List object permissions
- Read object permissions
Generate and securely store the Access key ID and Secret access key

Step 2: Configure the Kapa platform

Go to the Sources tab in the Kapa platform
Click Add new source
Select S3 Storage as the source type
Enter your bucket details and S3 credentials
Optionally configure a bucket prefix to limit the file paths to ingest
Click Save to begin the ingestion process

Configuration options

The following configuration options are available for the S3 Storage integration:

Option	Description	Default	Required
Bucket name	The name of your S3 bucket	None	Yes
Access key ID	Your S3 access key	None	Yes
Secret access key	Your S3 secret key	None	Yes
Bucket prefix	Optional path prefix to limit which files are ingested	None	No
File types	Select which file types to include	All supported types	No

Best practices

File organization

There are no strict requirements on file structure in your S3 bucket. Kapa will:

Start looking for files at the root of the bucket, or at the specified bucket prefix
Discover all supported files, including those in subdirectories
Process each file according to its file extension

URL mapping

To link files in your bucket to URLs (which Kapa can reference in responses):

Create an index.json file in your bucket with the following format:

[
  {
    "object_key": "example_file_1.md",
    "source_url": "https://docs.example.com/example_file_1.md"
  },
  {
    "object_key": "example_file_2.md",
    "source_url": "https://docs.example.com/example_file_2.md"
  }
]

note

Use absolute filepaths in object_key. If you've configured a bucket prefix, object_key path must include the full path including the prefix.

Place this file at the root of your bucket or directly under the bucket prefix
When files are successfully mapped to URLs, Kapa displays the URL in its citations and when you review conversations on the Kapa platform.

The mapping file is optional, and not all files have to be represented in the index.json.

Markdown formatting

Markdown files uploaded to S3 must be properly formatted for optimal AI comprehension. See How should I format markdown files for AI ingestion? for formatting requirements and examples.

Compatible storage services

While AWS S3 is the most common implementation, this integration works with any S3-compatible storage service, including:

Backblaze B2
DigitalOcean Spaces
Google Cloud Storage
IBM Cloud Object Storage
Linode Object Storage
MinIO
Oracle Cloud Infrastructure Object Storage
Scaleway Object Storage
Wasabi

And others that implement the S3 protocol.

Troubleshooting

Permission denied errors: Verify your credentials have the correct permissions for the S3 bucket
Files not appearing: Check that your files are in supported formats and located within the specified bucket/prefix
URL mapping not working: Ensure your index.json is properly formatted and located at the root level
Format conversion issues: If your files are in unsupported formats, you'll need to convert them before ingestion
Connection issues: If using a non-AWS S3 provider, you may need to provide additional configuration (contact Kapa support)

Prerequisites​

Data ingested​

Permissions required​

Supported file formats​

Setup​

Step 1: Create credentials for Kapa​

Step 2: Configure the Kapa platform​

Configuration options​

Best practices​

File organization​

URL mapping​

Markdown formatting​

Compatible storage services​

Troubleshooting​