GitHub Files
Using GitHub Files as a source allows Kapa to pull files from your GitHub repository. This allows Kapa to tap directly into the ecosystem of project documentation, READMEs, and code examples, giving Kapa the opportunity to include relevant information on a project's current state.
Prerequisites
- A GitHub repository containing documentation files
- Repository owner and name information
- For private repositories, a personal access token with appropriate permissions
Data ingested
When you connect Kapa to GitHub Files, the following data is ingested:
- File URLs
- Full content of supported file types
- For Jupyter notebooks, both markdown content and code (excluding outputs like print statements and plots)
Supported formats
Currently the following GitHub file types are supported:
- Markdown Files (
.md
) - Jupyter Notebooks (
.ipynb
)
Permissions required
The following permissions are required when using a personal access token for private repositories:
Permission | Purpose | Security considerations |
---|---|---|
Contents: read-only | Allows Kapa to read file content and metadata | Kapa cannot write contents to the repository |
We recommend using a fine-grained access token limited to only the repositories you want to connect to Kapa.
Setup
Step 1: Connect your repository
- Go to the Sources tab on the Kapa platform and click on Add new source
- Enter a name for the source, select GitHub Files, and click Continue
- Specify the GitHub repository to use by filling in the Owner and Name fields
- If it's a private repository, enter a personal access token for authentication
- Upon successful connection, a purple text box appears, providing you with the repository description
Step 2: Configure your GitHub Files
Once you've set up your repository, configure which files to include:
- Select File Types to choose whether to include Markdown files, Jupyter notebooks, or both
- Optionally set File Include Regex to only include files matching specific patterns
- Optionally set File Exclude Regex to exclude files matching specific patterns
- Click Save to begin the ingestion process
Configuration options
The following configuration options are available for the GitHub Files integration:
Option | Description | Default | Required |
---|---|---|---|
Owner | GitHub username or organization that owns the repository | None | Yes |
Name | Name of the GitHub repository | None | Yes |
Personal access token | Token for authenticating to GitHub (for private repositories) | None | For private repositories |
File Types | Choose which file types to include (Markdown, Jupyter, or both) | Both | No |
File Include Regex | Only include files matching this regular expression pattern | All supported files | No |
File Exclude Regex | Exclude files matching this regular expression pattern | None | No |
Best practices
- Selective ingesting: Adding too many files may add more noise than signal to Kapa. Be selective when configuring which files to include.
- Focus on documentation: Prioritize files that contain explanatory content rather than code-heavy files.
- Use regex patterns effectively: Consider patterns like:
- Include only
/docs/.*\.md$
to focus on documentation directory - Exclude
/deprecated/.*
to avoid outdated content
- Include only
- Consider file organization: Well-structured repositories with clear documentation folders work best with this integration
Troubleshooting
- Authentication failures: Verify your personal access token has the required permissions and hasn't expired
- No files appearing: Check that your repository contains files of the supported types and matching your filter criteria