Table of contents

Data Transformation

Overview

This ETL tool has been designed to be documentation centric. This means that the same code is used to transform the data as is used to generate the documentation. They can never drift away from each other.

This tool extracts and transforms the data using a two step process.

graph LR
    source_file["Source files"] -- "Data staged using stage command" --> source_db[("Staging Database")]

    source_db -- "Aggregation SQL query" --> app["OMOP Transformer CLI tool"]

    app -- "Transformed Record" --> target_db[("OMOP Database")]

    app -- "Generated Documentation"--> docs["
- Structured JSON mapping explanations
- Query orientated SVG mapping diagrams
- Field orientated markdown explanations
    "]

Data extract

At this stage of a data extract we simply run a SQL query. This query is to be used to handle any data aggregations, joins or any special cases.

The query is declared within a named XML file. The file format importantly includes an explanation of the query.

Query file format

  • Sql this element contains the query definition.
  • Query/Explanations/Explanation/@columnName this attribute ties the explanation to one of the output fields of the query.
  • Query/Explanations/Explanation/Description a human readable explanation of what the field output from the query is including a summary of any logic (markdown supported)
  • Query/Explanations/Explanation/Origin an element that lists each NHS Data Dictionary element name that is used to produce the field. This element can occur many times per explanation.

Example

<Query>
    <Sql>
select
	distinct
		Patient_Postcode,
		NHS_Number,
		Date_Of_Birth,
		Person_Stated_Gender_Code
from omop_staging.sact_staging
	</Sql>

	<Explanations>
		<Explanation columnName="NHS_Number">
			<Description>Patient NHS Number</Description>
			<Origin>NHS NUMBER</Origin>
		</Explanation>

		<Explanation columnName="Patient_Postcode">
			<Description>Patient's Postcode.</Description>
			<Origin>POSTCODE</Origin>
		</Explanation>

		<Explanation columnName="Date_Of_Birth">
			<Description>Patient's date of birth.</Description>
			<Origin>PERSON BIRTH DATE</Origin>
		</Explanation>
		<Explanation columnName="Person_Stated_Gender_Code">
			<Description>The patient's Sex</Description>
			<Origin>PERSON GENDER CODE CURRENT</Origin>
		</Explanation>
	</Explanations>
</Query>

Transformation

Results from the first stage aggregation are transformed to the OMOP format using C# annotations.

Transformations are strongly typed between the incoming query record and the target OMOP record. This is achieved through polymorphism and attributes.

A transformation can be declared by inherting a class that derrives from one of the base OMOP classes. Each class represents a table in the OMOP database.

Supported OMOP classes

  • OmopConditionOccurrence
  • OmopDeath
  • OmopDrugExposure
  • OmopLocation
  • OmopObservation
  • OmopPerson
  • OmopProcedureOccurrence
  • OmopVisitDetail
  • OmopVisitOccurrence

When inherting from the base class, the source type must be specified as type T. The source type should represent a row of data from the incoming query.

Example

Declare a class to represent a row of incoming Data that uses the OmopDemographics.xml query.

[DataOrigin("COSD")]
[Description("COSD Demographics")]
[SourceQuery("OmopDemographics.xml")]
internal class CosdDemographics
{
    public string? StreetAddressLine1 { get; set; }
    public string? StreetAddressLine2 { get; set; }
    public string? StreetAddressLine3 { get; set; }
    public string? StreetAddressLine4 { get; set; }
    public string? Postcode { get; set; }
    public string? NhsNumber { get; set; }
    public string? PersonBirthDate { get; set; }
    public string? DateOfBirth { get; set; }
    public string? EthnicCategory { get; set; }
}

Declare a class to form a relationship between the OmopLocation type and the incoming CosdDemographics type.

using OmopTransformer.Annotations;
using OmopTransformer.COSD.Demographics;
using OmopTransformer.Omop.Location;
using OmopTransformer.Transformation;

namespace OmopTransformer.COSD;

internal class CosdLocation : OmopLocation<CosdDemographics>
{
    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine1))]
    public override string? address_1 { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine2))]
    public override string? address_2 { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine3))]
    public override string? city { get; set; }

    [Transform(typeof(UppercaseAndTrimWhitespace), nameof(Source.StreetAddressLine4))]
    public override string? county { get; set; }

    [Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]
    public override string? zip { get; set; }

    [Transform(
        typeof(TextDeliminator),
        nameof(Source.StreetAddressLine1),
        nameof(Source.StreetAddressLine2),
        nameof(Source.StreetAddressLine3),
        nameof(Source.StreetAddressLine4),
        nameof(Source.Postcode))]
    public override string? location_source_value { get; set; }

    [CopyValue(nameof(Source.NhsNumber))]
    public override string? nhs_number { get; set; }
}

A relationship is formed between a number of source fields and a OMOP target field by overriding the OMOP base classes field and adding the Transform attribute.

[Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]
public override string? zip { get; set; }

Transform Attribute

The Transform attribute is used to specify a transform operation and the source fields that should be used as an input.

Example

Use the Postcode postcode field as a parameter for the PostcodeFormatter selector.

[Transform(typeof(PostcodeFormatter), nameof(Source.Postcode))]

Supported Transformations

General Purpose Operations

DateAndTimeCombiner
Combines a date with a time of day.

* Argument 1 - a date, eg `20240101`
* Argument 2 - a time, eg `100500`
DateConverter
Converts text to dates.

* Argument 1 - a date, eg `20240101`
DayOfMonthSelector
Selects the day of the month or null if the date is null.

* Argument 1 - a date, eg `20240101`
MonthOfYearSelector
Selects the month of the year or null if the date is null.

* Argument 1 - a date, eg `20240101`
YearSelector
Selects the year from a date or null of the date is null.

* Argument 1 - a date, eg `20240101`
NumberParser
Converts text to integers.

* Argument 1 - a number as text, eg `123`
UppercaseAndTrimWhitespace
Converts text to uppercase.

* Argument 1 - text, eg `hello world`
TextDeliminator
Separates text with newlines. Trim whitespace.

* Argument 1 - text, eg `hello`
* Argument 2 - text, eg `world`
* etc - unlimited arguments are supported.
PostcodeFormatter
Uppercase the postcode then insert the space in the correct location, if needed.

* Argument 1 - text, eg `LS11 5DD`

Code resolvers

Icd10Selector
Resolves ICD10 codes to OMOP concepts. If code cannot be mapped, map using the parent code.

* Argument 1 - text, eg `T76`
Opcs4Selector
Resolve OPCS4 codes to OMOP concepts. If code cannot be mapped, map using the parent code.

* Argument 1 - text, eg `E66.8`
SnomedSelector
Converts ICD10/OPCS4 concepts to SNOMED concepts.

* Argument 1 - a concept id , eg `911784`

This selector usually uses the result of either the Icd10Selector or Opcs4Selector selectors.

[Transform(typeof(Icd10Selector), nameof(Source.DiagnosisCode))]
public override int? condition_source_concept_id { get; set; }

[Transform(typeof(SnomedSelector), useOmopTypeAsSource: true, nameof(condition_source_concept_id))]
public override int[]? condition_concept_id { get; set; }
Icdo3Selector
Resolve ICD-o-3 codes to OMOP concepts.

* Argument 1 - histology, eg `M81403`
* Argument 2- topography, eg `C18.9`

Lookups

NhsGenderLookup
Finds the concept code for a NHS Data Dictionary gender code.

* Argument 1 - text, eg `1`
RaceConceptLookup
Finds the concept code for a NHS Data Dictionary ethnicity code.

* Argument 1 - text, eg `A`
RaceSourceConceptLookup
Finds the source concept code for a NHS Data Dictionary ethnicity code.

* Argument 1 - text, eg `A`

Notes

Note: JSON would be used instead of XML but it does not support strings that spans many lines. Multiline text can be stored, but only one a single line, which would have caused readability/review issues.