
De-Identify & Label PHI
DelPHI — AI-Powered De-identification
Detect and neutralize all 18 HIPAA Safe Harbor identifiers from clinical documents with context-aware NLP — supporting text, PDF, and scanned images via OCR.

Core Features
Context-Aware NLP Engine
Unlike keyword-matching tools, DelPHI uses clinical NLP to understand context. It knows 'Dr. Smith ordered labs' contains PHI while 'Smith & Nephew implant' does not. This reduces false positives by up to 90% compared to regex-based solutions.
- Clinical context understanding
- False positive reduction
- Medical terminology awareness
- Entity disambiguation
HIPAA Safe Harbor Compliance
Complete coverage of all 18 HIPAA Safe Harbor identifiers with configurable redaction modes — mask, replace with synthetic data, or remove entirely. Full audit trail for compliance reporting.
- All 18 identifier types
- Configurable redaction modes (mask/replace/remove)
- Expert determination method support
- Audit trail generation
Advanced OCR Pipeline
Process scanned documents and images with our medical-grade OCR pipeline. Layout-aware text extraction preserves document structure for accurate PHI detection even in complex clinical forms.
- Multi-format support (PDF, images, scans)
- Layout-aware extraction
- Handwriting recognition
- Clinical form parsing
Batch & Real-Time Processing
Process thousands of documents in batch mode or integrate via REST API for real-time de-identification within your existing workflow. OAuth 2.0 authentication, rate limiting, and comprehensive logging.
- REST API with OAuth 2.0
- Batch mode for large document sets
- Real-time streaming mode
- Progress tracking & error handling
How DelPHI Works
Document Ingestion
Upload clinical documents (text, PDF, scanned images). OCR pipeline extracts text while preserving layout.
PHI Detection
Context-aware NLP identifies all 18 HIPAA identifier types. Each detection includes confidence score and evidence span.
De-identification
Configurable redaction — mask ([REDACTED]), replace with synthetic data, or remove. Output maintains document formatting.