Skip to main content

GitHub Issues

The Kapa platform provides an integration to ingest GitHub issues as a data source. A common pattern for developers searching for a solution is to look into the GitHub issues of a project after not finding anything in the official documentation. GitHub issues contain workarounds, explanations, open discussions and much more information helpful to augment your documentation. Hence, we always recommend connecting your GitHub issues as a data source, but there are some caveats to be aware of.

Prerequisites

  • A GitHub repository containing issues
  • Repository owner and name information
  • For private repositories, a personal access token with appropriate permissions

Data ingested

When you connect Kapa to GitHub Issues, the following data is ingested:

  • Issue URLs
  • Issue titles and body content
  • Comments and discussion threads on issues
  • Issue status
  • User information (anonymized)

Permissions required

The following permissions are required when using a personal access token for private repositories:

PermissionPurposeSecurity considerations
Issues: read-onlyEnables access to issues, comments, and labelsKapa cannot create or modify issues

We recommend using a fine-grained access token limited to only the repositories you want to connect to Kapa.

Setup

Step 1: Connect your repository

  1. Go to the Sources tab on the Kapa platform and click on Add new source
  2. Enter a name for the source, select GitHub Issues, and click Continue
  3. Specify the GitHub repository to use by filling in the Owner and Name fields
  4. If it's a private repository, enter a personal access token for authentication
  5. Upon successful connection, a purple text box appears, providing you with the repository description

Step 2: Filter your issues

Configure the following optional parameters to only select a subset of your issues to include:

  1. Choose Issue state to include open issues, closed issues, or both
  2. Set an Issue age filter to only include recently updated issues
  3. Select Include issue labels to target specific issue categories
  4. Select Exclude issue labels to filter out irrelevant issues
  5. Click Save to begin the ingestion process

Configuration options

The following configuration options are available for the GitHub Issues integration:

OptionDescriptionDefaultRequired
OwnerGitHub username or organization that owns the repositoryNoneYes
NameName of the GitHub repositoryNoneYes
Personal access tokenToken for authenticating to GitHub (for private repositories)NoneFor private repos
Issue stateFilter issues by state (open, closed, or both)BothNo
Issue ageOnly include issues updated within the specified time rangeAll timeNo
Include issue labelsOnly include issues with one of the specified labelsAll labelsNo
Exclude issue labelsExclude issues with any of the specified labelsNoneNo

Best practices

Experiment with filtering

It's difficult to recommend a general set of filters, as what works well is highly project dependent. Issues with certain labels might be irrelevant or misleading, issues of a certain age might be too outdated and contain false information, or closed issues might generally be irrelevant for users of your project.

By experimenting with different filter settings and reviewing Kapa's conversations in production, you can determine what the right level is for your repository.

Use a dedicated label to exclude outdated issues

A common problem is that outdated issues are hard to exclude with a general Issue age filter. An issue from three years ago might still be highly relevant while a three-month-old issue now contains false information because of a new release. A manual but effective solution to this problem is to create a new dedicated label for excluding outdated issues (e.g., exclude-kapa) and exclude it via the Exclude issue labels filter.

Review your conversations in the Kapa platform periodically and when you notice Kapa giving a false or undesired answer because of an outdated GitHub issue, manually apply the exclude-kapa label to it. Through this manual workflow, you can weed out bad content from your GitHub issues without losing good content by applying too broad of a filter.

Potentially split your GitHub issues into multiple sources

It might be too difficult to select an effective subset of GitHub issues using a single source. For example, issues with label Bug are relevant as long as they are open no matter their issue age. However, once they are closed it makes no more sense to include them. On the other hand, issues with label Discussion are usually relevant if they are no older than 2 years but their status does not matter. In these scenarios, it makes sense to create multiple sources to be able to apply effective filtering.

Troubleshooting

  • Authentication failures: Verify your personal access token has the required permissions and hasn't expired
  • No issues appearing: Check that your repository contains issues matching your filter criteria
  • Outdated information: Use the dedicated label approach to exclude specific outdated issues