Table of contents

Data Transformation

Overview

This ETL tool has been designed to be documentation centric. This means that the same code is used to transform the data as is used to generate the documentation. They can never drift away from each other.

This tool extracts and transforms the data using a two step process.

graph LR
    source_file["Source files"] -- "Data staged using stage command" --> source_db[("Staging Database")]

    source_db -- "Aggregation SQL query" --> app["OMOP Transformer CLI tool"]

    app -- "Transformed Record" --> target_db[("OMOP Database")]

    app -- "Generated Documentation"--> docs["
- Structured JSON mapping explanations
- Query orientated SVG mapping diagrams
- Field orientated markdown explanations
    "]

Data extract

At this stage of a data extract we simply run a SQL query. This query is to be used to handle any data aggregations, joins or any special cases.

The query is declared within a named XML file. The file format importantly includes an explanation of the query.

Query file format

  • Sql this element contains the query definition.
  • Query/Explanations/Explanation/@columnName this attribute ties the explanation to one of the output fields of the query.
  • Query/Explanations/Explanation/Description a human readable explanation of what the field output from the query is including a summary of any logic (markdown supported)
  • Query/Explanations/Explanation/Origin an element that lists each NHS Data Dictionary element name that is used to produce the field. This element can occur many times per explanation.

Example

<Query>
    <Sql>
select
	distinct
		Patient_Postcode,
		NHS_Number,
		Date_Of_Birth,
		Person_Stated_Gender_Code
from omop_staging.sact_staging
	</Sql>

	<Explanations>
		<Explanation columnName="NHS_Number">
			<Description>Patient NHS Number</Description>
			<Origin>NHS NUMBER</Origin>
		</Explanation>

		<Explanation columnName="Patient_Postcode">
			<Description>Patient's Postcode.</Description>
			<Origin>POSTCODE</Origin>
		</Explanation>

		<Explanation columnName="Date_Of_Birth">
			<Description>Patient's date of birth.</Description>
			<Origin>PERSON BIRTH DATE</Origin>
		</Explanation>
		<Explanation columnName="Person_Stated_Gender_Code">
			<Description>The patient's Sex</Description>
			<Origin>PERSON GENDER CODE CURRENT</Origin>
		</Explanation>
	</Explanations>
</Query>

Transformation

Results from the first stage aggregation are transformed to the OMOP format using C# annotations.

Transformations are strongly typed between the incoming query record and the target OMOP record. This is achieved through polymorphism and attributes.

A transformation can be declared by inherting a class that derrives from one of the base OMOP classes. Each class represents a table in the OMOP database.

Supported OMOP classes

  • OmopConditionOccurrence
  • OmopDeath
  • OmopDrugExposure
  • OmopDeviceExposure
  • OmopLocation
  • OmopObservation
  • OmopPerson
  • OmopCareSite
  • OmopProvider
  • OmopMeasurement
  • OmopProcedureOccurrence
  • OmopVisitDetail
  • OmopVisitOccurrence

When inherting from the base class, the source type must be specified as type T. The source type should represent a row of data from the incoming query.

Example

Declare a class to represent a row of incoming Data that uses the OmopDemographics.xml query.

[DataOrigin("COSD")]
[Description("COSD Demographics")]
[SourceQuery("OmopDemographics.xml")]
internal class CosdDemographics
{
    public string? StreetAddressLine1 { get; set; }
    public string? StreetAddressLine2 { get; set; }
    public string? StreetAddressLine3 { get; set; }
    public string? StreetAddressLine4 { get; set; }
    public string? Postcode { get; set; }
    public string? NhsNumber { get; set; }
    public string? PersonBirthDate { get; set; }
    public string? DateOfBirth { get; set; }
    public string? EthnicCategory { get; set; }
}

Declare a class to form a relationship between the OmopLocation type and the incoming CosdDemographics type.

using OmopTransformer.Annotations;
using OmopTransformer.COSD.Demographics;
using OmopTransformer.Omop.Location;
using OmopTransformer.Transformation;

namespace OmopTransformer.COSD;

internal class CosdLocation : OmopLocation<CosdDemographics>
{
    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine1))]
    public override string? address_1 { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine2))]
    public override string? address_2 { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine3))]
    public override string? city { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine4))]
    public override string? county { get; set; }

    [Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]
    public override string? zip { get; set; }

    [Transform(
        typeof(TextDeliminator),
        nameof(Source.StreetAddressLine1),
        nameof(Source.StreetAddressLine2),
        nameof(Source.StreetAddressLine3),
        nameof(Source.StreetAddressLine4),
        nameof(Source.Postcode))]
    public override string? location_source_value { get; set; }

    [CopyValue(nameof(Source.NhsNumber))]
    public override string? nhs_number { get; set; }
}

A relationship is formed between a number of source fields and a OMOP target field by overriding the OMOP base classes field and adding the Transform attribute.

[Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]
public override string? zip { get; set; }

Transform Attribute

The Transform attribute is used to specify a transform operation and the source fields that should be used as an input.

Example

Use the Postcode postcode field as a parameter for the PostcodeFormatter selector.

[Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]

Supported Transformations

General Purpose Operations

DateAndTimeCombiner
Combines a date with a time of day.

* Argument 1 - a date, eg `20240101`
* Argument 2 - a time, eg `100500`

For a complete list of transformations, please refer to our data dictionary.

Lookups

NhsGenderLookup
Finds the concept code for a NHS Data Dictionary gender code.

* Argument 1 - text, eg `1`
AccidentAndEmergencyDischargeDestinationLookup
Lookup discharge destination concept for A&E.

* Argument 1 - text code, eg `1` (In Patient Hospital)

Links:

NhsAEDiagnosisLookup
Accident and Emergency Diagnosis to OMOP Condition Concept IDs

* Argument 1 - text code, eg `01 02` (Laceration of Head)

Links:

NhsAETreatmentLookup
Accident and Emergency Treatment to OMOP Procedure Concept IDs

* Argument 1 - text code, eg `01` (Dressing)

Links:

NhsMainSpecialityCodeLookup
Maps NHS Main Speciality Codes to OMOP concepts

* Argument 1 - text code, eg `100` (General Surgery)

Links:

RelationshipSelector
Resolve Measurement domain ICD10 codes to `Maps To Value` concepts.

* Argument 1 - a concept code
NhsCriticalCareActivityDeviceLookup
CCMDS Critical Care Activity Code Device Concept IDs

* Argument 1 - text code, eg `1` (Respiratory support via a tracheal tube)

Links:

NhsCriticalCareActivityCodeLookup
CCMDS Critical Care Activity Code Concept IDs

* Argument 1 - text code, eg `1` (Respiratory support via a tracheal tube)

Links:

Notes

Note: JSON would be used instead of XML but it does not support strings that spans many lines. Multiline text can be stored, but only one a single line, which would have caused readability/review issues.