Why Traditional DLP Can’t Find PHI in Your SaaS Stack

We’ve analyzed hundreds of healthcare organizations running traditional Data Loss Prevention tools in Google Workspace and Microsoft 365. The pattern is consistent.

Protected Health Information (PHI) is technically covered by native DLP templates, but large amounts sit in over-shared drives, shared-to-anyone links, or files that already left the SaaS boundary. The security team drowns in noisy, pattern-based alerts with no context or automation, so they can’t systematically reduce real PHI exposure.

That’s the core failure: traditional, in-platform DLP proves that rules exist, but it doesn’t continuously see and automatically reduce the real-world PHI blast radius across your SaaS estate.

The Pattern-Matching Problem

Traditional DLP engines treat PHI as static text patterns inside a single SaaS boundary. They focus on regexes and a fixed library of sensitive info types (MRNs, SSNs, insurance IDs) inside email bodies and documents.

This works for audits but misses non-templated narratives, consultations, and free-form notes that still clearly contain PHI. Healthcare was the worst affected sector in 2025, accounting for 22% of ransomware attacks, with breaches costing an average of $10.22 million.

The result is two bad extremes: massive false negatives on unstructured clinical collaboration, or over-broad patterns that generate so many false positives that teams start ignoring alerts.

Screenshot Blindness

Clinicians constantly share chart screenshots, EHR views, and diagnostic images via Teams, Gmail, Drive, or SharePoint. Without robust, real-time OCR plus vision models, most of that PHI is invisible to native DLP.

A screenshot arrives as a binary blob. The traditional engine treats it as a file with no parseable text stream, so its regex detectors either never run or just scan the file name and limited metadata.

Even when OCR is available in premium tiers, it’s often asynchronous and kicks in after the content is already shared. PHI is exposed first and protected second.

How AI Detection Works Differently

An OCR and AI pipeline first converts the screenshot into structured text regions and then reasons over those regions semantically. It can recognize a patient, a date of birth, a medical record number even when none of those appear in a simple pattern list.

The technical shift happens in stages. A vision model scans the image and outputs bounding boxes for regions that look like text (headers, table cells, labels, ID bands). This works at the level of shapes and edges, so it finds text overlays inside EHR UI chrome, badge labels, and tiny timeline annotations that a regex engine would never know existed.

For each bounding box, an OCR model converts glyphs into characters, preserving structure like line breaks and layout coordinates. The output is now a structured representation tied back to its pixel location in the screenshot.

A PHI classifier (typically a named-entity recognition model or a fine-tuned LLM) runs over that extracted text and labels spans as entities like patient name, MRN, date of birth, address, provider, facility. Because it sees the semantics of the snippet, it can flag PHI even when identifiers are partial, out of order, or formatted in ways that don’t match any regex.

Machine learning models trained on EHR databases achieve 99% accuracy when detecting PHI-related fields for unseen datasets. The number of missed PHI using AI detection is approximately three out of every 1,000 samples.

Contextual Understanding Catches What Regex Misses

We’ve seen outbound clinical summaries pasted into SaaS messaging that look human and harmless to regex, but combine partial identifiers and local context into something that clearly meets the PHI threshold.

In a real Microsoft 365 tenant, a message between a hospitalist and an external consultant looked roughly like this: a 67-year-old female admitted last night from a specific dialysis center, with stage IV CKD, poorly controlled diabetes, a syncopal episode during her second run this week, CT showing no acute bleed, new LBBB on EKG.

No full name, no MRN, no SSN, no classic pattern identifiers. The only direct identifier in text was an initial. But the combination of age, gender, unique clinical history, location, and event timing is more than enough to single out an individual in a real-world population.

A native, rule-driven DLP policy missed it because there was no 9-digit SSN, no recognizable MRN format, no insurance ID, no email address, and no full legal name. Generic medical words like diabetes or CT appear constantly in internal collaboration, so admins had tuned those down after prior alert fatigue.

An AI-driven PHI pipeline treated this as a short clinical note and ran NER and LLM-based classification over the full text. It tagged entities like patient age, gender, location, partial name, conditions, and events. The model’s risk logic considered the combination of multiple clinical attributes plus location plus time window, and the fact that this context was being sent to an external identity.

Even without a full last name or a classic ID number, the classifier elevated this as PHI because the combination of quasi-identifiers in a bounded population makes the individual readily identifiable.

False Positives Drop When Models Learn Context

When DLP generates constant noise, security teams quietly shift from risk reduction to alert survival. They start ignoring, bulk-closing, or downgrading whole classes of PHI alerts. That creates larger, better-documented compliance gaps instead of closing them.

Analysts adopt handle 100 alerts per shift as the success metric, so they skim payloads and close anything that looks vaguely like a false positive without validating downstream exposure. They create playbooks that auto-suppress entire policy categories just to get queues under control, which removes human review from scenarios that can still be HIPAA breaches if the context is wrong.

Most healthcare orgs see a step-change reduction in useless alerts from AI-powered PHI detection. You don’t get perfect precision on day one (there’s an initial tuning period where you dial the model into your workflows).

Out of the box, good PHI models are already more precise than regex because they understand context. One 12-hospital system saw investigation times drop by 94%, false positive alerts decrease by 78%, and 27 previously undetected compliance gaps were identified after adopting AI-driven security.

After that early calibration phase, alert volume drops, but the percentage of alerts that are real PHI risk goes up significantly. Analysts spend more time on high-severity investigations and less on mechanical triage.

What Office of Civil Rights Sees in an Audit

The difference in an Office of Civil Rights audit or breach investigation is that you’re no longer just saying we bought DLP. You can show a living control system that continuously discovers PHI, automatically prevents leaks, and documents every decision end-to-end.

Your HIPAA risk analysis doesn’t just list email and file shares generically. It includes specific SaaS assets (Google Workspace, M365, Slack, AI tools), the PHI they carry, and how AI-DLP classifies and de-identifies that PHI in motion.

You can show that high-risk flows like outbound to non-business associates or exports to AI platforms are governed by stricter policies (block or de-identify by default) while lower-risk internal flows are monitored with appropriate safeguards and logging.

Each PHI incident file includes the exact content or screenshot, PHI entities detected, whether de-identification was applied, who received it, and which automated containment actions ran like link revocation, quarantine, or redaction.

Logs show short detection-to-containment timelines, evidence of automated and manual steps, and clear rationale for the breach risk assessment. In many investigations, you can show that what left your SaaS estate was already de-identified according to documented rules, so the event is either not a reportable breach or is materially lower impact.

In 2025, 76% of all OCR enforcement actions included a penalty for a risk analysis failure, making comprehensive risk analysis the number one enforcement priority.

Start With One High-Risk Channel

Pick one high-risk SaaS channel (for most healthcare orgs that’s outbound email or external file sharing) and layer an AI-powered PHI detection and leak prevention control in monitor-only mode alongside your existing DLP.

Running it in parallel on a single, well-scoped channel lets you answer three critical questions with real data. What true PHI events is AI catching that legacy DLP missed? Which existing DLP alerts would the AI model not fire on? How often can de-identification transform a risky event into a safe, allowed workflow instead of a block?

Once you can show that delta (same channel, same users, but AI plus de-identification gives you fewer, higher-quality PHI incidents and more safe allows) it becomes straightforward to tune thresholds, flip that channel into enforcement, and then expand the architecture across more SaaS apps without a big-bang rip-and-replace of your current stack.

AI-powered PHI controls are powerful risk reducers that still require governance, clinical context, and a clear line on where PHI is and is not allowed to flow. You still need clear PHI policies by channel, a tuning phase where your analysts label borderline alerts so the system learns your specific workflows and risk tolerance.

The orgs that win with this tech treat it like any other critical clinical system. They assign an owner, define success metrics like MTTR and true-positive rate and PHI exposure surface, and iterate.

AI helps you see more PHI in screenshots and free-text narratives, transform more of it into lower-risk forms, and act faster when something goes wrong. But it doesn’t decide what reasonable and appropriate safeguards mean for your institution. That still comes from your risk analysis, your culture, and your clinical reality.

References

HIPAA Journal. “Healthcare Data Breach Statistics.” https://www.hipaajournal.com/healthcare-data-breach-statistics/
National Center for Biotechnology Information. “Machine Learning Models for PHI Detection in EHR Databases.” https://pmc.ncbi.nlm.nih.gov/articles/PMC10785837/
HIPAA Journal. “2025 Healthcare Data Breach Report.” https://www.hipaajournal.com/2025-healthcare-data-breach-report/

Share this article

Was this helpful?

Why Traditional DLP Can’t Find PHI in Your SaaS Stack

The Pattern-Matching Problem

Screenshot Blindness

How AI Detection Works Differently

Contextual Understanding Catches What Regex Misses

False Positives Drop When Models Learn Context

What Office of Civil Rights Sees in an Audit

Start With One High-Risk Channel

References

Share this article

Written by Sergiy Balynsky

Related articles

DLP Alert Fatigue: How AI Prioritization and Auto-Remediation Save Burned-Out Security Teams

Why Manual SaaS DLP Is No Longer Sustainable: From Rule Sprawl to AI-Driven Policy Automation

Recognition

Book a Demo with Spin.AI