PII detection and data masking
Kapa offers robust protection for Personally Identifiable Information (PII) through two complementary features:
- User message PII detection
- Knowledge source PII masking
These features ensure sensitive personal data is neither stored in Kapa's systems nor included in chatbot responses.
PII detection in user messages
If PII detection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.
How user message data is processed
When PII detection is enabled for user messages:
- Data is encrypted during transport.
- When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
- If any PII is detected, sensitive fields in the question are redacted before the question is processed.
Enabling PII detection for user messages
PII detection and filtering is not enabled by default. To enable it:
-
Open the Kapa platform.
-
Click your user avatar to open the profile menu.
-
Select 📁 Projects from the dropdown.
-
Click the Edit button on the project for which you want to enable PII detection in user messages for.
-
Select the PII types that you want to mask.
If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with
<name>
.Kapa supports Python-flavored regex.
-
Save your changes.

PII masking in knowledge sources
If PII is detected in the content of a document which is crawled by Kapa, you have the option to anonymize that information by replacing it with masked labels.
How knowledge source data is processed
When PII masking is enabled for knowledge sources:
- During the crawling process, documents are scanned for PII.
- Detected PII is "masked" with an anonymized label.
- Only the masked version of the content is stored in Kapa's system.
- This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.
Enabling PII masking for knowledge sources
To enable the PII masking feature for a specific source:
-
Navigate to the Sources screen on the Kapa platform.
-
Click on the three-dot menu, to the right of a given source, and then click Configure.
-
Select the PII types that you want to mask.
If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with
<name>
.Kapa supports Python-flavored regex.
-
Save your changes.
-
Refresh the source for the changes to take effect.
Supported PII types
You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable masking for the following PII types in data sources, user queries, or both:
- Phone numbers
- Names
- Email addresses
- Credit card numbers
- IBAN codes
- IP addresses (only for user queries)
- Custom PII entities
Custom PII entities
In addition to Kapa's pre-defined PII types, you can define custom entities to mask using regular expression patterns. This is useful for masking things like API keys and passwords when they occur in queries or sources.
Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.
When you've enable custom PII entities, entities that match the specified patterns are substituted by the PII name: