Skip to main content

S3 Storage

Kapa provides an integration to pull files from AWS S3 and other S3-compatible storage as data sources. This integration is mainly useful as a general-purpose data source if you need to give Kapa access to files from a source that isn't officially supported, or if you would like to manually control or preprocess the content you ingest to Kapa.

Prerequisites

Data ingested

When you connect Kapa to S3 Storage, the following data is ingested:

Permissions required

The following permissions are required for your S3 credentials:

PermissionPurposeSecurity considerations
List object permissionsAllows Kapa to discover files in your bucketRead-only access to file listings
Read object permissionsEnables Kapa to read file contentRead-only access to file content

We recommend creating dedicated credentials with only these specific permissions for the bucket you want to connect.

Supported file formats

The S3 Storage integration currently supports:

  • Markdown: .md files
  • Text: .txt files
  • Word: .docx files

Files in other formats are ignored.

Setup

Step 1: Create credentials for Kapa

  1. In your S3 provider's account management:
    • For AWS: Create a new IAM user or use an existing one
    • For other S3 providers: Create an API key or access credential
  2. Ensure the credentials have the following permissions for your bucket:
    • List object permissions
    • Read object permissions
  3. Generate and securely store the Access key ID and Secret access key

Step 2: Configure the Kapa platform

  1. Go to the Sources tab in the Kapa platform
  2. Click Add new source
  3. Select S3 Storage as the source type
  4. Enter your bucket details and S3 credentials
  5. Optionally configure a bucket prefix to limit the file paths to ingest
  6. Click Save to begin the ingestion process

Configuration options

The following configuration options are available for the S3 Storage integration:

OptionDescriptionDefaultRequired
Bucket nameThe name of your S3 bucketNoneYes
Access key IDYour S3 access keyNoneYes
Secret access keyYour S3 secret keyNoneYes
Bucket prefixOptional path prefix to limit which files are ingestedNoneNo
File typesSelect which file types to includeAll supported typesNo

Best practices

File organization

There are no strict requirements on file structure in your S3 bucket. Kapa will:

  • Start looking for files at the root of the bucket, or at the specified bucket prefix
  • Discover all supported files, including those in subdirectories
  • Process each file according to its file extension

URL mapping

To link files in your bucket to URLs (which Kapa can reference in responses):

  1. Create an index.json file in your bucket with the following format:

    [
    {
    "object_key": "example_file_1.md",
    "source_url": "https://docs.example.com/example_file_1.md"
    },
    {
    "object_key": "example_file_2.md",
    "source_url": "https://docs.example.com/example_file_2.md"
    }
    ]
    note

    Use absolute filepaths in object_key. If you've configured a bucket prefix, object_key path must include the full path including the prefix.

  2. Place this file at the root of your bucket or directly under the bucket prefix

  3. When files are successfully mapped to URLs, Kapa displays the URL in its citations and when you review conversations on the Kapa platform.

The mapping file is optional, and not all files have to be represented in the index.json.

Compatible storage services

While AWS S3 is the most common implementation, this integration works with any S3-compatible storage service, including:

  • Backblaze B2
  • DigitalOcean Spaces
  • Google Cloud Storage
  • IBM Cloud Object Storage
  • Linode Object Storage
  • MinIO
  • Oracle Cloud Infrastructure Object Storage
  • Scaleway Object Storage
  • Wasabi

And others that implement the S3 protocol.

Troubleshooting

  • Permission denied errors: Verify your credentials have the correct permissions for the S3 bucket
  • Files not appearing: Check that your files are in supported formats and located within the specified bucket/prefix
  • URL mapping not working: Ensure your index.json is properly formatted and located at the root level
  • Format conversion issues: If your files are in unsupported formats, you'll need to convert them before ingestion
  • Connection issues: If using a non-AWS S3 provider, you may need to provide additional configuration (contact Kapa support)