Working with NHS England data#
Last modified: 01 Apr 2025
Does UK LLC check the accuracy of health records?
No, the UK LLC Data Team can only see de-identified records in the TRE and does not amend any participant data. The UK LLC Data Team only performs the following data curation tasks:
Clean and deduplicate data, dataset names and structures to enable data provisioning in an efficient manner while maintaining data integrity.
Load and integrate variable and value labelling, where available from the NHS API and other web sources, into master metadata tables.
Run the automated disclosure control risk assessment and manually review all flagged risks.
What medical codes are used in the NHS England data available in the TRE?
The main clinical classifications mandated by NHS England are SNOMED CT, ICD-10 and OPCS-4. More information on codes used in Electronic Health Records (EHRs) is available here: Coded variables
For which datasets do researchers need to provide codelists?
Researchers must provide codelists for their projects if they intend to use any of the following datasets:
HES (Hospital Episode Statistics)
GDPPR (General Practice Extraction Service (GPES) Data for Pandemic Planning and Research)
PCM (Primary Care Medicines)
The datasets use a range of clinical classifications, including:
ICD-9 (HES & cancer registrations)
ICD-10 (HES)
SNOMED-CT (GDPPR)
OPCS-4 (HES)
ODS (cancer registrations and PCM)
dm+d (PCM)
NHS national codes (all datasets)
More information creating a codelist is available here: Codelists
How can I quantify the effect of applying codelists to my dataset?
The CORE file NHSD_Presence contains the number of appearances and the date of the most recent appearance for each participant for each available NHS data source. Comparing LPS participants’ presence in NHS data sources against the data provisioned to a project will identify which participants appear in the data source but are not included in the provisioned data.
What impact do the different levels of coding have on HES data?
The extent to which specific coding is used in HES data is important. For example, you may observe more records in your HESAPC (admitted patients) than in HESOP (outpatients) dataset, despite the national volume of HESOP records being typically ~5x greater per year. This is because HESAPC has meaningful diagnoses codes consistently provided, whereas generic codes are more often used in HESOP. This means when codes provided by a researcher are matched with HES data in the TRE, fewer matches (‘hits’) will be made on datasets with non-specific codes. Thus fewer records will be included in the project.
Examples of non-specific codes include “R69=Not known” for diagnoses and “X997=Not known” for operations. These are used extensively in HESOP, but far less so in HESAPC.
UK LLC is considering the way it makes linked health records available, by initially making unfiltered views available to researchers (with particularly sensitive records removed) rather than asking for codelists upfront. This will allow codelists to be developed whilst working with the data, but will also allow exploration of records which do not have specific codes assigned.
Why are there some missing variable and value labels in some datasets?
Variable labelling is primarily sourced from an NHS metadata API, but is not fully complete. Gaps in HES and MHSDS have been infilled from additional data dictionary sources. As part of ongoing work, we will be integrating additional sources to further complete the labelling and add value labels. We will inform users as these are updated. The approx. current variable label completeness is:
HES, NPEX, COVIDSGSS: 100%
MHSDS: 70 - 90%
GDPPR, CVS, CVAR: 70%
PCM: 40%
DEMOGRAPHICS, CHESS, IELISA: not available.
What version of NHS England data was I provisioned?
NHS England data provisioned to projects are locked to a specific extract. This is done using the extract_date variable found in the dataset, and is the date the data was extracted at NHS England.
All projects are ‘locked’ to an NHS quarterly extract as well to as a fixed table, which controls permissions/consent. This locking is done based on the time of first provision of each project in the TRE. This locking prevents participant numbers from fluctuating during the course of a project (if, for example, more data or more participants are added to the TRE).
Each fixed table is logged as a quarterly ‘freeze’. The freeze number, and freeze date, is provided in the ‘documentation’ folder in each TRE project space.
Why are some NHS England variables excluded or encrypted?
Prior to upload to the UK LLC TRE database, NHS data are assessed for disclosure risk. During this process, variables can be excluded from the upload if they are deemed to be disclosive. In cases where the variable has utility in an encrypted form, the variable is encrypted rather than excluded and an _e suffix is added to the end of the variable name e.g. lsoa _e. Encryption is usually applied to variables which are, or provide, proxies for location information smaller than region.
What do the _ACP, _MAT and _OTR suffixes refer to in HES data?
_OTR is short for Other and is an extension of the HES record. There should be a 1:1 relationship between the main record found in HESAPC for example and its extension in HESAPC_OTR
_ACP is short for Augmented care period. This was collected from 1997–2006. It was replaced by HESCC (critical care) in 2008
_MAT is short for Maternity and contains variables associated with maternity-related admissions.
See below for data and sub table lookup relationships. Note: HESCC is a subset of HESAPC

How can I link _ACP, _MAT, _OTR”, and HESCC data to their main record?
These sub tables do not contain an individual-level identifier. They therefore need to be linked to the main HESAPC / HESOP / HESAE datsets. See below for the linkage keys for each dataset:

How to find test results in COVID-19 datasets?
NPEX and IELISA: Use the variable “testresult”. The result is SNOMED (SCT) coded. There are 6 codes used, e.g. “SCTID: 1240581000000104”: “Severe acute respiratory syndrome coronavirus 2 detected (finding)”.
COVIDSGSS: This dataset does not contain a test results field. We are awaiting confirmation from NHS England about how to interpret the presence of records in this dataset.
How can I request additional data for my project?
Requests for new data should be submitted via an amendment to UK LLC. You may apply for additional linked data, additional data from already approved LPS, and/or data from additional LPS.
N.B. each type of data amendment requires a different level of review before being approved.