← Back to Blog

The Data Subject is Not the Only Exclusion

When organisations receive a DSAR, the instinct is straightforward: find every file that mentions the data subject, redact everyone else's information, and hand it over. The reality is far more nuanced.


Beyond the Data Subject

Anyone who has actually processed a DSAR knows the truth. Not every file that mentions a person should be disclosed. Not every document type is relevant. And not every piece of text that looks like personal data actually is personal data in context.

In PIIQ, we built an exclusion system that reflects this reality. The data subject might be the reason you're searching, but they're far from the only thing you need to filter.


Discovery: Starting with the Data Subject

The process begins with Discovery. You point PIIQ at a directory, add your data subject's details — their name, email address, any known identifiers — and run a scan. Discovery uses fast pattern matching to identify which files actually mention the data subject, separating your file set into "active" files (containing matches) and "ignored" files (no matches found).

Screenshot: PIIQ Discovery Configuration dialog showing data subject entries and document scope settings

This alone is a significant optimisation. In a typical directory of a thousand files, perhaps only a third will actually reference the individual. By identifying those files up front, you avoid processing hundreds of irrelevant documents — saving time, cost, and effort.

But just because a file mentions the data subject doesn't automatically mean it should be disclosed.

Screenshot: PIIQ Discovery complete showing 34 in-scope files, 6 out of scope, and 85 DSAR matches


Document Scope: Filtering by Content Type

Imagine your Discovery scan finds the data subject's name in 200 files. Some are business contracts. Some are internal memos. Some are social media exports. And one is a document clearly marked as legally privileged.

Under most data protection frameworks, legally privileged communications are exempt from disclosure. Social media posts that happen to mention someone may not be relevant. Personal financial documents belonging to someone else shouldn't be included just because the subject's name appears in passing.

This is where Document Scope keywords come in. PIIQ lets you define two lists:

  • In-Scope keywords identify document types you do want to process — terms like "business correspondence", "employee record", "customer inquiry", or "contract".
  • Out-of-Scope keywords identify document types you want to exclude — terms like "attorney-client privilege", "personal", "social media", or "marketing campaign".

When a file's content matches an out-of-scope keyword, it's filtered out regardless of whether it mentions the data subject. The exclusion happens at the document level, before any PII detection or redaction takes place.

Screenshot: PIIQ files list showing scoped and excluded documents


Standard Exclusions: Pre-Built Categories

Building keyword lists from scratch for every request would be tedious. That's why PIIQ ships with fourteen pre-configured Standard Exclusion categories, each containing curated keyword lists for common document types.

Typically Excluded (out-of-scope by default)

CategoryExamples
Legally PrivilegedAttorney-client privilege, work product, legal advice, litigation, settlement agreements
Social MediaFacebook, Twitter, Instagram, LinkedIn, TikTok, and related terminology
Personal DocumentsMedical records, private correspondence, family matters, shopping receipts
Job HuntingResumes, cover letters, job applications, recruiter correspondence
Marketing MaterialsCampaigns, advertisements, promotional content, newsletters, SEO analytics
Personal FinancialBank statements, credit cards, mortgages, tax returns, investment portfolios
Spam & JunkPhishing attempts, unsolicited messages, obvious spam patterns

Typically Included (in-scope by default)

CategoryExamples
Business CommunicationsInternal memos, meeting minutes, project updates, status reports
Contracts & AgreementsNDAs, employment agreements, service level agreements, terms and conditions
HR & Employee RecordsPersonnel files, performance reviews, payroll, benefits enrollment
Business FinancialFinancial statements, balance sheets, invoices, audit reports
Technical DocumentationSpecifications, architecture documents, API documentation, release notes
Customer CommunicationsSupport tickets, customer inquiries, complaints, sales correspondence
Regulatory & ComplianceGDPR filings, HIPAA documentation, privacy policies, audit records

Selecting a category applies all of its keywords in one click. You can then customise — adding or removing individual keywords as needed for your specific request.


Anti-PII: When a Name Isn't Personal Data

There's another category of exclusion that operates at a completely different level. Sometimes PIIQ's detection identifies something as personal data when, in your context, it isn't.

Consider a company called "James & Partners". "James" looks like a person's name. Or a product called "Alice" — the detection engine will flag it every time it appears. In a large document set, these false positives add up quickly.

Anti-PII entries solve this. By adding a value as Anti-PII, you tell PIIQ: "this matches the pattern of personal data, but don't treat it as such." The value won't be redacted, no matter how many times it appears. It's an exclusion that operates at the detection level rather than the document level.

Hunters help you identify these cases. When you run Hunters, the results show every detected name, job title, and address with confidence scores — making it easy to spot false positives and add them as Anti-PII before analysis begins.

Screenshot: PIIQ Hunters results showing detected names, job titles, and addresses with confidence scores


Manual Exclusions: The Human Override

Finally, there are cases where no automated rule captures what you need. A file passes every filter — it mentions the data subject, it matches in-scope keywords, it contains genuine PII — but for some specific reason, it shouldn't be disclosed.

PIIQ's Out of Scope tab provides a manual override. Individual files can be moved out of scope with a reason recorded for audit purposes. This is the highest-priority exclusion — it overrides everything else. If a human decides a file shouldn't be processed, that decision stands.

Screenshot: PIIQ Document Scope dialog showing standard exclusion categories like Legally Privileged and Social Media, with keyword entry and category selection


How the Layers Work Together

These exclusion mechanisms aren't alternatives — they're layers that work in sequence:

  1. Discovery narrows the universe to files mentioning the data subject
  2. Document Scope filters those files by content type and category
  3. Anti-PII refines what counts as personal data within the remaining files
  4. Manual Exclusions provide a final human checkpoint

Each layer reduces the working set. A directory of a thousand files might narrow to 350 after Discovery, 200 after Document Scope filtering, with Anti-PII preventing dozens of false-positive redactions across those remaining documents.

Screenshot: PIIQ file statistics showing reduction at each processing stage


Why This Matters

Processing a DSAR isn't just about finding every mention of a name and redacting everything else. It requires judgement about what's relevant, what's exempt, what's proportionate, and what's actually personal data in context.

The data subject triggers the search. But the exclusions shape the response.

Getting exclusions right means faster processing, lower costs, fewer false positives, and responses that are legally sound rather than just technically complete. It means not disclosing legally privileged material, not wasting resources on spam, and not redacting your client's product name three thousand times across a document set.

The timing report below shows the full processing pipeline — from Discovery through to PDF generation — completing in under two minutes. That speed is only possible because the exclusion layers have already done the heavy lifting.

Screenshot: PIIQ timing report showing total processing time of under 2 minutes across all stages


Best Practice: Getting to the Right Document Set

The most effective DSAR responses come from methodically narrowing your document set before you spend a single credit on analysis. Here's the recommended workflow.

Step 1: Start broad, then scope with Standard Exclusions — Before you even enter a data subject name, apply the Standard Exclusion categories that fit your situation. Exclude Legally Privileged, Spam & Junk, and Marketing Materials first. This takes seconds and can eliminate entire classes of documents.

Step 2: Add your own scope keywords — Every organisation has its own document conventions. Add project code names, internal system names, or document prefixes as out-of-scope keywords.

Step 3: Enable filename checking — Many organisations use naming conventions that immediately reveal document type. Files prefixed with "LEGAL_", folders named "Marketing_2024", spreadsheets called "spam_log.xlsx".

Step 4: Run Discovery with the data subject — With scope keywords in place, Discovery works on a pre-filtered set. The result is a focused list of active files.

Step 5: Review and use manual exclusions — Browse the active files. Move anything that slipped through to Out of Scope manually. Record your reason for the audit trail.

Step 6: Set up Anti-PII before analysis — Think about false positives specific to this document set. Company names, product names, building names, office locations that match surnames.

Step 7: Analyse the refined set — Only now run analysis. Your document set has been through four layers of filtering. The files you're analysing are the files that actually matter.

Screenshot: PIIQ Analysis in progress — processing 34 files with real-time PII detection

Review the results — PIIQ highlights every redaction in the analysis view, letting you verify accuracy across every document.

Screenshot: PIIQ analysis view processing a refined, scoped document set with redactions highlighted


The goal isn't to process the most files. It's to process the right files. Use PIIQ's exclusion layers in order — broad categories first, then targeted keywords, then Discovery, then manual review — and you'll arrive at a document set you can be confident in before analysis even begins.

Book a Demo