Skip to main content
DelPHI — De-Identify & Label PHI | AI-Powered De-identification

De-Identify & Label PHI

DelPHI — AI-Powered De-identification

Detect and neutralize all 18 HIPAA Safe Harbor identifiers from clinical documents with context-aware NLP — supporting text, PDF, and scanned images via OCR.

DelPHI AI de-identification — transforming clinical documents into HIPAA-safe output

Core Features

Context-Aware NLP Engine

Unlike keyword-matching tools, DelPHI uses clinical NLP to understand context. It knows 'Dr. Smith ordered labs' contains PHI while 'Smith & Nephew implant' does not. This reduces false positives by up to 90% compared to regex-based solutions.

  • Clinical context understanding
  • False positive reduction
  • Medical terminology awareness
  • Entity disambiguation

HIPAA Safe Harbor Compliance

Complete coverage of all 18 HIPAA Safe Harbor identifiers with configurable redaction modes — mask, replace with synthetic data, or remove entirely. Full audit trail for compliance reporting.

  • All 18 identifier types
  • Configurable redaction modes (mask/replace/remove)
  • Expert determination method support
  • Audit trail generation

Advanced OCR Pipeline

Process scanned documents and images with our medical-grade OCR pipeline. Layout-aware text extraction preserves document structure for accurate PHI detection even in complex clinical forms.

  • Multi-format support (PDF, images, scans)
  • Layout-aware extraction
  • Handwriting recognition
  • Clinical form parsing

Batch & Real-Time Processing

Process thousands of documents in batch mode or integrate via REST API for real-time de-identification within your existing workflow. OAuth 2.0 authentication, rate limiting, and comprehensive logging.

  • REST API with OAuth 2.0
  • Batch mode for large document sets
  • Real-time streaming mode
  • Progress tracking & error handling

How DelPHI Works

1

Document Ingestion

Upload clinical documents (text, PDF, scanned images). OCR pipeline extracts text while preserving layout.

2

PHI Detection

Context-aware NLP identifies all 18 HIPAA identifier types. Each detection includes confidence score and evidence span.

3

De-identification

Configurable redaction — mask ([REDACTED]), replace with synthetic data, or remove. Output maintains document formatting.