Skip to main content

PII protection

Kapa offers robust protection for Personally Identifiable Information (PII) by completely removing or substituting sensitive data through two complementary features:

  • User message PII protection
  • Knowledge source PII protection

These features ensure that sensitive personal data is permanently removed from content and neither stored in Kapa's systems nor included in chatbot responses.

User message PII protection

If PII protection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.

PII input protection

How user message data is processed

When PII protection is enabled for user messages:

  1. Data is encrypted during transport.
  2. When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
  3. If any PII is detected, sensitive fields in the question are redacted before the question is processed.

Enabling user message PII protection

PII protection is not enabled by default. To enable it:

  1. Open the Kapa platform.

  2. Click your user avatar to open the profile menu.

  3. Select 📁 Projects from the dropdown.

  4. Click the Edit button on the project for which you want to enable PII protection in user messages.

  5. Select the PII types that you want to deidentify.

    If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

    Kapa supports Python-flavored regex.

  6. Save your changes.

PII config for user question

Knowledge source PII protection

If PII is detected in the content of a document which is crawled by Kapa, you have the option to protect that information by completely removing it or substituting it with anonymized labels. The original PII data is permanently discarded and not stored anywhere in Kapa's systems.

PII source protection

How knowledge source data is processed

When PII protection is enabled for knowledge sources:

  1. During the crawling process, documents are scanned for PII.
  2. Detected PII is removed and replaced with anonymized labels.
  3. Only the sanitized version of the content is stored in Kapa's system.
  4. This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.

Enabling knowledge source PII protection

To enable PII protection for a specific source:

  1. Navigate to the Sources screen on the Kapa platform.

  2. Click on the three-dot menu, to the right of a given source, and then click Configure.

  3. Select the PII types that you want to protect.

    If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

    Kapa supports Python-flavored regex.

  4. Save your changes.

  5. Refresh the source for the changes to take effect.

Supported PII types

You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable protection for the following PII types in data sources, user queries, or both:

  • Phone numbers
  • Names
  • Email addresses
  • Credit card numbers
  • IBAN codes
  • IP addresses (only for user queries)
  • Custom PII entities

Custom PII entities

In addition to Kapa's pre-defined PII types, you can define custom entities to protect using regular expression patterns. This is useful for protecting things like API keys and passwords when they occur in queries or sources.

Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.

When you've enabled custom PII entities, entities that match the specified patterns are substituted with the PII name:

PII substitution