PII protection

Kapa offers robust protection for Personally Identifiable Information (PII) by completely removing or substituting sensitive data through two complementary features:

User message PII protection
Knowledge source PII protection

These features ensure that sensitive personal data is permanently removed from content and neither stored in Kapa's systems nor included in chatbot responses.

User message PII protection

If PII protection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.

PII input protection

How user message data is processed

When PII protection is enabled for user messages:

Data is encrypted during transport.
When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
If any PII is detected, sensitive fields in the question are redacted before the question is processed.

Enabling user message PII protection

PII protection is not enabled by default. To enable it:

Open the Kapa platform.
Click your user avatar to open the profile menu.
Select 📁 Projects from the dropdown.
Click the Edit button on the project for which you want to enable PII protection in user messages.
Select the PII types that you want to deidentify.

If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

Kapa supports Python-flavored regex.
Save your changes.

Knowledge source PII protection

If PII is detected in the content of a document which is crawled by Kapa, you have the option to protect that information by completely removing it or substituting it with anonymized labels. The original PII data is permanently discarded and not stored anywhere in Kapa's systems.

PII source protection

How knowledge source data is processed

When PII protection is enabled for knowledge sources:

During the crawling process, documents are scanned for PII.
Detected PII is removed and replaced with anonymized labels.
Only the sanitized version of the content is stored in Kapa's system.
This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.

Enabling knowledge source PII protection

To enable PII protection for a specific source:

Navigate to the Sources screen on the Kapa platform.
Click on the three-dot menu, to the right of a given source, and then click Configure.
Select the PII types that you want to protect.

If you're adding a custom PII entity, specify the type of entity in the Name field, and the regular expression pattern that matches the entity in the Regex pattern field. The matching string is substituted with <name>.

Kapa supports Python-flavored regex.
Save your changes.
Refresh the source for the changes to take effect.

Supported PII types

You have the flexibility to enable or disable specific types of PII detection as per your requirements. You can enable protection for the following PII types in data sources, user queries, or both:

Phone numbers
Full names (first name + family name)
Email addresses
Credit card numbers
IBAN codes
IP addresses (only for user queries)
Custom PII entities

Custom PII entities

In addition to Kapa's pre-defined PII types, you can define custom entities to protect using regular expression patterns. This is useful for protecting things like API keys and passwords when they occur in queries or sources.

Kapa supports Python-flavored regex. To validate that your regular expression pattern matches the expected strings, use a tool like Regex101 or pythex.

When you've enabled custom PII entities, entities that match the specified patterns are substituted with the PII name:

PII substitution

Allow list

You can define an allow list of terms that should not be flagged as PII, even if they match a configured PII type. This is useful when certain values — such as a company email address or a well-known name — appear frequently in user queries or knowledge sources and should not be redacted.

For example, if email address detection is enabled, adding support@company.com to the allow list ensures that this specific address is never treated as PII, while all other email addresses continue to be detected and redacted as usual.

To configure the allow list, add entries in the Allow List section of the PII configuration panel when editing a project or source.

User message PII protection​

How user message data is processed​

Enabling user message PII protection​

Knowledge source PII protection​

How knowledge source data is processed​

Enabling knowledge source PII protection​

Supported PII types​

Custom PII entities​

Allow list​