< Back

Normalization and Deduplication

Health Gorilla applies normalization and deduplication logic during data ingestion to ensure that retrieved records are accurate, complete, and free from unnecessary redundancy. This processing allows your system to operate on a unified clinical record while maintaining fidelity to the original data.

Ingestion and Mapping

When external data is ingested, the solution parses each resource and maps it to a standard FHIR structure. Relevant clinical concepts are normalized to widely adopted code systems:

  • Logical Observation Identifiers Names and Codes (LOINC) for laboratory tests and clinical observations
  • Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT) for problems, conditions, and procedures
  • RxNorm for medications
  • International Classification of Diseases, Tenth Revision (ICD-10) for diagnoses
  • Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System (HCPCS) for procedures and claims data

If source records lack codes, the ingestion engine applies heuristics or maps values based on document context or associated metadata.

Deduplication Logic

Structured data is deduplicated to reduce fragmentation and support consistent downstream processing. Deduplication is performed by evaluating:

  • Patient-level identifiers and demographic metadata
  • Clinical concept alignment using code, display name, and effective date
  • Source document lineage using Provenance and DocumentReference
  • Context-specific rules based on resource type

Examples of resource-specific matching:

  • Observation: deduplicated using LOINC code, value, effective date, and performer
  • MedicationStatement: matched on RxNorm code, dose, route, and timing
  • Condition: compared by clinical status, onset date, and SNOMED CT or ICD-10 code

Document Handling

Unstructured documents such as clinical summaries, scanned reports, and progress notes are not deduplicated. Instead, they are preserved in full and surfaced as DocumentReference resources. When applicable, these documents are linked to associated structured records.

Identity Reconciliation

All records are associated with a unified patient identity using Health Gorilla's enterprise Master Patient Index (eMPI). This process reconciles incoming identifiers and ensures that disparate records from different networks are attributed to the correct patient. Source traceability is maintained.

Summary

Normalization aligns records to standardized clinical vocabularies, while deduplication ensures that structured data is consolidated across sources. These operations improve data quality, reduce noise, and enable meaningful clinical use. Unstructured documents are preserved without modification, and identity reconciliation ensures consistency across all retrieved data.