Skip to main content

PII detection and data masking

Kapa offers robust protection for Personally Identifiable Information (PII) through two complementary features:

  • User message PII detection
  • Knowledge source PII masking

These features ensure sensitive personal data is neither stored in Kapa's systems nor included in chatbot responses.

PII detection in user messages

If PII detection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.

PII input detection and masking

How user message data is processed

When PII detection is enabled for user messages:

  1. Data is encrypted during transport.
  2. When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
  3. If any PII is detected, sensitive fields in the question are redacted before the question is processed.

Enabling PII detection for user messages

PII detection and filtering is not enabled by default. To enable it:

  1. Open the Kapa platform.

  2. Click your user avatar to open the profile menu.

  3. Select 📁 Projects from the dropdown.

  4. Click the Edit button on the project for which you want to enable PII detection in user messages for.

  5. Select the PII types that you want to mask.

    If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

    Kapa supports Python-flavored regex.

  6. Save your changes.

PII config for user question

PII masking in knowledge sources

If PII is detected in the content of a document which is crawled by Kapa, you have the option to anonymize that information by replacing it with masked labels.

PII source detection and masking

How knowledge source data is processed

When PII masking is enabled for knowledge sources:

  1. During the crawling process, documents are scanned for PII.
  2. Detected PII is "masked" with an anonymized label.
  3. Only the masked version of the content is stored in Kapa's system.
  4. This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.

Enabling PII masking for knowledge sources

To enable the PII masking feature for a specific source:

  1. Navigate to the Sources screen on the Kapa platform.

  2. Click on the three-dot menu, to the right of a given source, and then click Configure.

  3. Select the PII types that you want to mask.

    If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

    Kapa supports Python-flavored regex.

  4. Save your changes.

  5. Refresh the source for the changes to take effect.

Supported PII types

You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable masking for the following PII types in data sources, user queries, or both:

  • Phone numbers
  • Names
  • Email addresses
  • Credit card numbers
  • IBAN codes
  • IP addresses (only for user queries)
  • Custom PII entities

Custom PII entities

In addition to Kapa's pre-defined PII types, you can define custom entities to mask using regular expression patterns. This is useful for masking things like API keys and passwords when they occur in queries or sources.

Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.

When you've enable custom PII entities, entities that match the specified patterns are substituted by the PII name:

PII substitution