Processing and linkage to NHS England datasets#

Last modified: 28 Aug 2024

Submission and general processing of NHS England datasets#

Datasets are submitted by health and social care organisations via a range of pathways, including upload to the Seconday Uses Service (SUS), the Strategic Data Collection Service in the cloud (SDCS Cloud) or the Message Exchange for Social Care and Health (MESH). The degree of data validation and derivation of variables depends on the dataset, as described in the individual dataset guides (where this information is available from NHS England).

Flow of LPS participants’ NHS England data into the UK LLC TRE#

Flows of data from contributing Longitudinal Population Studies (LPS) and NHS England are conducted through a ‘split file’ protocol where the flow of LPS participants’ identifiers (File 1s) is entirely separate from the flow of LPS participants’ NHS England data (File 2s) - see figure 1. Updates to NHS England datasets are expected to flow on a quarterly basis into the UK LLC TRE.

../../../../_images/Linkage_UKLLCDataFlows_Figure1b.jpg

Figure 1 An overview of the flow of LPS participants’ NHS England data into the UK LLC TRE

1. Each LPS sends a File 1 to DHCW#

Each LPS generates a File 1, containing only participant identifiers (NHS number, name, date of birth, sex and address) and permission flags (set at the LPS or participant level) and sends it securely to UK LLC’s Trusted Third Party, NHS Digital Health and Care Wales (DHCW). As detailed below in the opt out section, LPS can send updated File 1s prior to each quarterly flow of NHS England data.

2. DHCW acts as UK LLC’s linkage broker#

DHCW encrypts the STUDY_IDs and sends an ID mapping file, containing the encrypted STUDY_IDs, to Swansea University. DHCW acts as UK LLC’s linkage broker by sending a permission-filtered file of unique identifiers and encrypted STUDY_IDs (the NHS Output) to NHS England for linkage and extraction of datasets.

4. Swansea University receives and uploads data according to UK LLC specifications#

While the linkage process is the same for all datasets, the subsequent data pipelines are bespoke for each of the datasets according to UK LLC specifications.

The Population Data Science Development Team at Swansea University receives and uploads LPS participants’ deidentified NHS England records to the UK LLC database following variable-level rules detailed in specification documents provided by UK LLC. The rules require Swansea to encrypt or exclude particular variables to protect the anonymity of LPS participants. Geographical units smaller than region or strategic health authority are routinely encrypted - this includes Lower-layer Super Output Areas (LSOAs). These encrypted variables are useful because they can be used in grouping/multi-level modelling. Other variables deemed to pose a risk to disclosure that don’t have value in an encrypted form are excluded.

5. Management and curation of data by the UK LLC Data Team#

The UK LLC Data Team checks that all the NHS England data that are uploaded to the UK LLC database align with the data sharing agreement between NHS England and the University of Bristol. They also perform the following tasks:

  • Clean and deduplicate data, table names and structures to enable data provisioning in an efficient manner while maintaining data integrity.

  • Load and integrate variable and value labelling, where available from the NHS API and other web sources, into master metadata tables.

  • Run the automated disclosure control risk assessment and manually review all flagged risks.

LPS participants can opt out of linkage to their NHS England data#

LPS can send quarterly updates of their File 1s (participant identifiers and permission flags) to DHCW. If a participant has decided to opt out of UK LLC altogether or to opt out of linkage to their NHS records, or conversely has decided to opt into linkage to their NHS records, their instructions are communicated to NHS England in a delta file, which is based on the permission flags in the File 1. If a participant opts out, no further data about them will flow into the UK LLC TRE and the participant’s data will not be provisioned to new research projects. However, researchers who already have access to that individual’s information will be permitted to retain that access until the end of the project, but they will not obtain any new data about that individual.

While some LPS that contribute data to UK LLC’s TRE only use consent to determine inclusion of participants in the UK LLC TRE, other LPS operate a blended consent/section 251 (for English and Welsh participants) model to reduce bias and to improve research inclusivity. NHS National Data Opt Out is applied to all NHS data extractions, which means that any participant included using s251 as a legal basis is excluded if they have set a national data opt-out.