Skip to main content

PII detection and data masking

Kapa offers robust protection for Personally Identifiable Information (PII) through two complementary features:

  • User message PII detection
  • Knowledge source PII masking

These features ensure sensitive personal data is neither stored in Kapa's systems nor included in chatbot responses.

PII detection in user messages

If PII detection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.

PII input detection and masking

How user message data is processed

When PII detection is enabled for user messages:

  1. Data is encrypted during transport.
  2. When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
  3. If any PII is detected, the question is rejected and no data from the user question is stored. The question is not counted as an "asked question" towards question volumes in the contract.
  4. The user receives a reply that they should not ask questions containing PII and they should remove any PII before submitting the question again.

During this brief processing window, all data is encrypted in transit and at rest with industry-standard encryption. Processing takes place in secure environments with strict access controls. Kapa maintains SOC 2 Type II certification for security controls.

Enabling PII detection for user messages

PII detection and filtering is not enabled by default. To enable it:

  1. Reach out to the Kapa team.
  2. Specify which types of PII you'd like to detect.
  3. The Kapa team configures the feature for you.

PII masking in knowledge sources

If PII is detected in the content of a document which is crawled by Kapa, you have the option to anonymize that information by replacing it with masked labels.

PII source detection and masking

How knowledge source data is processed

When PII masking is enabled for knowledge sources:

  1. During the crawling process, documents are scanned for PII.
  2. Detected PII is "masked" with an anonymized label.
  3. Only the masked version of the content is stored in Kapa's system.
  4. This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.

Enabling PII masking for knowledge sources

To enable the PII masking feature for a specific source:

  1. Navigate to the Sources screen on the Kapa platform.

  2. Click on the three-dot menu, to the right of a given source, and then click Configure.

  3. Specify which types of PII you'd like to mask.

    If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

    Kapa supports Python-flavored regex.

  4. Save your changes.

  5. Refresh the source for the changes to take effect.

Supported PII types

You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable masking for the following PII types in data sources, user queries, or both:

  • Phone numbers
  • Names
  • Email addresses
  • Credit card numbers
  • IBAN codes
  • IP addresses (only for user queries)
  • Custom PII entities

Custom PII entities

In addition to Kapa's pre-defined PII types, you can define custom entities to mask using regular expression patterns. This is useful for masking things like API keys and passwords when they occur in queries or sources.

Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.

When you've enable custom PII entities, entities that match the specified patterns are substituted by the PII name:

PII substitution