Skip to main content

GitHub Files

Using GitHub Files as a source allows Kapa to pull files from your GitHub repository. This allows Kapa to tap directly into the ecosystem of project documentation, READMEs, and code examples, giving Kapa the opportunity to include relevant information on a project's current state.

Prerequisites

  • A GitHub repository containing documentation files
  • Repository owner and name information
  • For private repositories, a personal access token with appropriate permissions

Data ingested

When you connect Kapa to GitHub Files, the following data is ingested:

  • File URLs
  • Full content of supported file types
  • For Jupyter notebooks, both markdown content and code (excluding outputs like print statements and plots)

Supported formats

Currently the following GitHub file types are supported:

  • Markdown Files (.md)
  • Jupyter Notebooks (.ipynb)

Permissions required

The following permissions are required when using a personal access token for private repositories:

PermissionPurposeSecurity considerations
Contents: read-onlyAllows Kapa to read file content and metadataKapa cannot write contents to the repository

We recommend using a fine-grained access token limited to only the repositories you want to connect to Kapa.

Setup

Step 1: Connect your repository

  1. Go to the Sources tab on the Kapa platform and click on Add new source
  2. Enter a name for the source, select GitHub Files, and click Continue
  3. Specify the GitHub repository to use by filling in the Owner and Name fields
  4. If it's a private repository, enter a personal access token for authentication
  5. Upon successful connection, a purple text box appears, providing you with the repository description

Step 2: Configure your GitHub Files

Once you've set up your repository, configure which files to include:

  1. Select File Types to choose whether to include Markdown files, Jupyter notebooks, or both
  2. Optionally set File Include Regex to only include files matching specific patterns
  3. Optionally set File Exclude Regex to exclude files matching specific patterns
  4. Click Save to begin the ingestion process

Configuration options

The following configuration options are available for the GitHub Files integration:

OptionDescriptionDefaultRequired
OwnerGitHub username or organization that owns the repositoryNoneYes
NameName of the GitHub repositoryNoneYes
Personal access tokenToken for authenticating to GitHub (for private repositories)NoneFor private repositories
File TypesChoose which file types to include (Markdown, Jupyter, or both)BothNo
File Include RegexOnly include files matching this regular expression patternAll supported filesNo
File Exclude RegexExclude files matching this regular expression patternNoneNo

Best practices

  • Selective ingesting: Adding too many files may add more noise than signal to Kapa. Be selective when configuring which files to include.
  • Focus on documentation: Prioritize files that contain explanatory content rather than code-heavy files.
  • Use regex patterns effectively: Consider patterns like:
    • Include only /docs/.*\.md$ to focus on documentation directory
    • Exclude /deprecated/.* to avoid outdated content
  • Consider file organization: Well-structured repositories with clear documentation folders work best with this integration

Troubleshooting

  • Authentication failures: Verify your personal access token has the required permissions and hasn't expired
  • No files appearing: Check that your repository contains files of the supported types and matching your filter criteria