PII detection and data masking
Kapa offers robust protection for Personally Identifiable Information (PII) through two complementary features:
- User message PII detection
- Knowledge source PII masking
These features ensure sensitive personal data is neither stored in Kapa's systems nor included in chatbot responses.
PII detection in user messages
If PII detection is enabled and PII is detected in a user message, the chatbot does not generate an answer. Instead, it notifies the user that PII exists in their message and suggests they try again.
How user message data is processed
When PII detection is enabled for user messages:
- Data is encrypted during transport.
- When data is received, the first process is to run the PII filter. No data is persistently stored prior to running PII filter.
- If any PII is detected, the question is rejected and no data from the user question is stored. The question is not counted as an "asked question" towards question volumes in the contract.
- The user receives a reply that they should not ask questions containing PII and they should remove any PII before submitting the question again.
During this brief processing window, all data is encrypted in transit and at rest with industry-standard encryption. Processing takes place in secure environments with strict access controls. Kapa maintains SOC 2 Type II certification for security controls.
Enabling PII detection for user messages
PII detection and filtering is not enabled by default. To enable it:
- Reach out to the Kapa team.
- Specify which types of PII you'd like to detect.
- The Kapa team configures the feature for you.
PII masking in knowledge sources
If PII is detected in the content of a document which is crawled by Kapa, you have the option to anonymize that information by replacing it with masked labels.
How knowledge source data is processed
When PII masking is enabled for knowledge sources:
- During the crawling process, documents are scanned for PII.
- Detected PII is "masked" with an anonymized label.
- Only the masked version of the content is stored in Kapa's system.
- This ensures sensitive data is never visible and never directly mentioned in answers generated by Kapa.
Enabling PII masking for knowledge sources
To enable the PII masking feature for a specific source:
- Navigate to the Sources screen on the Kapa platform.
- Click on the three-dot menu, to the right of a given source, and then click Configure.
- Specify which types of PII you'd like to mask.
- Save your changes.
Note that the PII detection only applies once the source is refreshed.
Supported PII types
You have the flexibility to enable or disable specific types of PII detection as per your requirements.
Recommended PII types
- Phone numbers
- Credit Cards
- Email Addresses
Other supported PII types
- IBAN Code
- IP Address
- US Bank Number
- US Driver License
- US ITIN
- US Passport
- US SSN
- Location
- Person
- Medical License
- URL
- Nationality, religious or political group
- Bitcoin Addresses
- Date and Time
Scanning for URLs can be part of PII detection, but it's optional. Depending on your product's use case, you might want to disable this option. For instance, if your users often share legitimate URLs as part of their queries, disabling this might be beneficial.