PII protection
Kapa offers robust protection for Personally Identifiable Information (PII) by completely removing or substituting sensitive data through two complementary features:
- User message PII protection
- Knowledge source PII protection
These features ensure that sensitive personal data is permanently removed from content and neither stored in Kapa's systems nor included in chatbot responses.
User message PII protection
If PII protection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.
How user message data is processed
When PII protection is enabled for user messages:
- Data is encrypted during transport.
- When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
- If any PII is detected, sensitive fields in the question are redacted before the question is processed.
Enabling user message PII protection
PII protection is not enabled by default. To enable it:
-
Open the Kapa platform.
-
Click your user avatar to open the profile menu.
-
Select 📁 Projects from the dropdown.
-
Click the Edit button on the project for which you want to enable PII protection in user messages.
-
Select the PII types that you want to deidentify.
If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with
<name>
.Kapa supports Python-flavored regex.
-
Save your changes.

Knowledge source PII protection
If PII is detected in the content of a document which is crawled by Kapa, you have the option to protect that information by completely removing it or substituting it with anonymized labels. The original PII data is permanently discarded and not stored anywhere in Kapa's systems.
How knowledge source data is processed
When PII protection is enabled for knowledge sources:
- During the crawling process, documents are scanned for PII.
- Detected PII is removed and replaced with anonymized labels.
- Only the sanitized version of the content is stored in Kapa's system.
- This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.
Enabling knowledge source PII protection
To enable PII protection for a specific source:
-
Navigate to the Sources screen on the Kapa platform.
-
Click on the three-dot menu, to the right of a given source, and then click Configure.
-
Select the PII types that you want to protect.
If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with
<name>
.Kapa supports Python-flavored regex.
-
Save your changes.
-
Refresh the source for the changes to take effect.
Supported PII types
You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable protection for the following PII types in data sources, user queries, or both:
- Phone numbers
- Names
- Email addresses
- Credit card numbers
- IBAN codes
- IP addresses (only for user queries)
- Custom PII entities
Custom PII entities
In addition to Kapa's pre-defined PII types, you can define custom entities to protect using regular expression patterns. This is useful for protecting things like API keys and passwords when they occur in queries or sources.
Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.
When you've enabled custom PII entities, entities that match the specified patterns are substituted with the PII name: