Understanding the HESAPC dataset#
Last modified: 27 Feb 2026
1. Introduction#
The APC (admitted patient care) dataset records summary information about all hospital admissions to NHS hospitals in England.
NHS England defines an admission as any activity that takes up a hospital bed. This includes:
day cases allocated a hospital bed
emergency and planned stays
births
N.B. What constitutes an inpatient procedure varies between hospitals.
Admission thresholds can also vary within hospitals depending on patient characteristics such as car ownership, caring responsibilities, distance from home.
2. Strengths of HESAPC#
Although not designed as a research resource, there are many advantages to using the HESAPC dataset for research. For example:
It is longitudinal making it possible for researchers to study cohorts over time
It provides complete coverage of the population, including all hospital-based diagnoses and operations across all clinical specialities
It uses standardised coding systems, enabling comparative work to be undertaken internationally
Data can be validated against national disease registries and published audit or surveillance data
3. Limitations of HESAPC#
It is an administrative dataset, meaning only medical conditions that have financial implications for an admission are likely to be recorded in the dataset
The complex structure of the dataset means researchers may have to undertake a substantial amount of data cleaning before starting their analyses
There are inconsistencies in the definition of an ‘admission’, both between and within hospitals
Coding is unlikely to be consistent as there are variations between hospitals and over time
The severity of patients’ diagnoses is not recorded
Diagnoses dates refer to the date on which each diagnosis was recorded at hospital, so it is difficult to identify pre-existing conditions
There is no information about prescribed or dispensed medicines
It excludes private hospital care (unless funded by the NHS)
4. Scope and coverage#
As well as costing information, the dataset contains patient-level coded information about:
diagnoses
procedures (operations)
dates of admission, discharge & procedures
type of admission (e.g. elective or emergency)
discharge destination
hospital
patients’ ethnic group, GP practice & Index of Multiple Deprivation (IMD)
5. Data collection methodology#
All NHS England hospital datasets serve the primary purpose of recording periods of care for the reimbursement to hospitals for the care they have provided to patients. This process is summarised on Guidebook’s hospital datasets page.
Some data (e.g. postcode, date of birth) are generated automatically from local patient administration systems. In HESAPC, diagnosis and procedure codes are based on clinicians’ discharge summaries, and are generated by clinical coders using national clinical coding standards for ICD-10 and OPCS. Up to 20 diagnoses, and 24 procedures, are recorded for each hospital ‘episode’ (see Structure of the dataset below).
6. Structure of the dataset#
a) Admissions and periods of care#
Data in the HESAPC dataset are organised into episodes and spells. Each row in the dataset indicates a Finished Consultant Episode (FCE), which is a continuous period of care under one consultant at a single hospital. A spell is a continuous period of care within a single hospital from admission to discharge or death (more commonly called an ‘admission’).
Most patients in the HESAPC datasets are represented by one row of data (i.e. a spell comprising one episode), but others may be represented by multiple rows if they move between consultants within or between hospitals (see the scenario below). In the HESAPC_MAT dataset, each birth generates at least two episodes, one recording details of the delivery (relating to the mother) and one episode per child delivered (relating to the child).
If the patient was seen by multiple consultants during the same stay at the same hospital, a spell may contain one or more FCEs, i.e. one or more rows of data per patient (see figure 1). The first (or only) FCE can also be called a Finished Admission Episode (FAE) and the final (or only) FCE can also be called a Discharge Episode. This is why there are more FCEs than FAEs in the APC dataset: https://digital.nhs.uk/data-and-information/publications/statistical/hospital-admitted-patient-care-activity.
Figure 1 Episodes and spells in the HESAPC dataset - each row of data in the dataset corresponds to a single FCE
More information about HESAPC episodes and spells is in the tips for researchers below.
b) Recording of diagnoses and procedures#
Each Finished Consultant Episode (see 1) above) can contain up to 20 diagnoses, and up to 24 procedures. See Coding systems below for more information.
For diagnoses, the field diag_01 records the main reason for an individual being admitted to hospital. The subsequent fields diag_02…diag_20 record comorbidities. N.B. Dates of diagnosis are not recorded. If the main diagosis is not known - most commonly for unfinished consultant episodes - the field diag_01 will be recorded as R69.X.
For procedures, the field opertn_01 records the main procedure (i.e. the one with the highest reimbursable cost). Secondary procedures are recorded in fields opertn_02…opertn_24. N.B. Dates of procedures are recorded in HESAPC (in fields opdate_01…opdate_24).
7. Coding systems used#
HESAPC uses two main medical coding systems: ICD-101 for diagnoses and OPCS for procedures and operations. As in all HES datasets, NHS National Codes are used for administrative information such as source of admission and discharge destination.
ICD-10 (International Statistical Classification of Diseases and Related Health Problems) Contains 22 hierarchical chapters, based on body systems (e.g. respiratory, digestive). It is used to record all diagnoses and, where relevant, causes of injuries.
OPCS (Office for Population Censuses and Surveys) Contains 24 hierarchical chapters, based on body systems (e.g. respiratory, digestive). This codiing system is UK-specific, so cannot be used for international comparisons. It records all procedures (e.g. surgery, MRI scans).
Note:
1 An earlier version (ICD-9) was used in HES prior to 1995, but does not occur in the HES data in the UK LLC TRE.
8. Evolution of the dataset#
Identifiers: The NHS started collecting Hospital Episode Statistics in England in 1989. However, NHS numbers were not required to be recorded in HES until 1997, meaning it is not possible to link individuals longitudinally in the dataset before that date. For this reason, people using HES for research tend to use 1997 as the starting point of the data.
Episode type: While the original focus was on inpatient episodes (‘admitted patient care’, APC), this has subsequently expanded to include diagnoses, procedures and patient demographics. Adult critical care (CC) episodes were included in HESAPC until 2008, when HESCC became a separate dataset. Although critical care episodes formed part of HESAPC prior to 2008, they were not flagged as ‘CC’ so is may not be possible to identify them in HESAPC.
Data collection: Data were originally submitted regionally, but this was changed to a national process in 1996, and the current Secondary Use Service (SUS) was introduced in 2007. HES data prior to 2007 was less comprehensive than it has been since it has been derived from SUS.
Diagnostic codes: Coding of diagnoses in HES changed from ICD-9 to ICD-10 in 1995. The maximum number of diagnostic fields recorded on each line of the dataset is now 20. This has increased from a maximum of 7 (prior to 2002), then 14 (in 2002-2007).
9. Availability in the UK LLC TRE#
The UK LLC TRE holds an extract of the HESAPC dataset, going back to 1998. The HESAPC records of participants in UK LLC’s partner LPS, where individual or LPS permissions allow linkage to NHS data, are included in the TRE. UK LLC does not hold any information about people who are not part of a partner LPS or about LPS participants who have requested that their NHSE data not be shared via UK LLC.
More detailed information about the UK LLC’s HESAPC extract is here.
10. Missing information#
Variable and value labels
UK LLC is infilling missing variable and value labels in the NHSE datasets in the TRE. Where variable labels have been added by UK LLC, rather than being found in NHSE documentation, this is made apparent by the phrase ‘label added by UK LLC’ being included in the variable label.Missing data The amount of missing data varies widely between variables and across datasets. Throughout 2026, we will update this section with information about missingness in HESAPC.
11. Tips for researchers using HESAPC in the UK LLC TRE#
a) Working with episodes and spells#
Continuous Inpatient (CIP) spells In addition to the FCEs outlined above, there is a more complex scenario again in which a patient is transferred to a different hospital during a single admission. In this instance, a new spell begins. To identify and measure continuous hospital stays, which include transfers to other hospitals, Continuous Inpatient (CIP) spells need to be derived (see figure 2).
Figure 2 Spells and CIP spells in the HESAPC dataset
Episodes/spells that span financial years FCEs are entered into the HESAPC dataset according to the financial year in which they end. Consequently, episodes/spells that start in one financial year and end in another will be classified as unfinished in the starting financial year and finished in the ending financial year.
Depending on the research question, unfinished episodes/spells may need to be removed before analysis to prevent double counting.
b) Working with medical codes#
When applying to access linked HESAPC data in the UK LLC TRE, researchers must submit a codelist specifying the ICD-10 and / or OPCS codes which are relevant to their research question.
In HESAPC, ICD-10 and OPCS codes are provided as both 3-character and 4-character fields. The 3-character version is a truncation of the 4-character field, providing a higher-level (less specific) diagnosis or procedure code. N.B. Not all diagnoses and procedures have a 4-character version. Where this is the case, the final character is infilled with ‘X’.
The dataset also includes fields which concatenate the ICD-10 and OPCS codes, simplifying the process of identifying a specific diagnosis or operation across multiple fields. An example of how coded diagnosis (ICD-10) fields are structured is shown below:
diag_3_01 |
diag_3_02 |
diag_3_03 |
diag_3_concat |
diag_4_01 |
diag_4_02 |
diag_4_03 |
diag_4_concat |
|---|---|---|---|---|---|---|---|
K31 |
K22 |
I48 |
K31,K22,I48 |
K317 |
K229 |
I489 |
K317,K229,I489 |
Notes:
The increase in the maximum number of diagnostic codes recorded per episode over time, means that diagnoses recorded before 2007 cannot be directly compared with more recent diagnoses.
Payment by Results and changing NHSE priorities also have implications for which conditions are recorded. See Further Reading below for more information on how NHS costings have changed over time.
c) Key variables in HESAPC#
Variable name |
Variable label |
Description |
Additional information |
|---|---|---|---|
admidate |
Admission date |
Date a patient was admitted at the start of a spell |
The same date is recorded for all episodes in a spell |
disdate |
Discharge date |
Date a patient was discharged from hospital |
Only present in the last episode of a spell |
epiend |
Episode end date |
Date a patient left the care of a particular consultant (through discharge, transfer to another consultant, or death) |
Date is missing if the episode is on-going at the end of the financial year (31st March) |
epiorder |
Order of episodes in a spell |
The number of the episode in a spell, increasing by 1 for each new episode until a patient is discharged |
All spells start with epiorder = 01. |
epistart |
Episode start date |
Date a patient started care under a particular consultant |
If >1 episodes in a spell, each episode will have a new epistart date |
epistat |
Episode status |
Whether the episode had finished before the end of the financial year (31stMarch) |
1 = unfinished; 3 = finished |
The full HES data dictionary can be downloaded from NHS England.
12. Useful syntax#
Below we will include syntax that may be helpful to other researchers in the UK LLC TRE. For longer scripts, we will include a snippet of the code plus a link to the UK LLC Github repository where you can find the full scripts.
13. Further reading#
Amies-Cull B, Luengo-Fernandez R, Scarborough, P et al. NHS reference costs: a history and cautionary note. Health Economics Review 2023 (13, 54). https://doi.org/10.1186/s13561-023-00469-0
Boyd A, Cornish R, Johnson L, Simmonds S, Syddall H, Westbury L, Cooper C, Macleod J. Understanding Hospital Episode Statistics (HES). London, UK: CLOSER; 2017. Available from: https://www.closer.ac.uk/wp-content/uploads/CLOSER-resource-understanding-hospital-episode-statistics-2018.pdf
Herbert A, Wijlaars L, Zylbersztejn A, Cromwell D, Hardelid P. Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC). International Journal of Epidemiology. 2017 Mar 15. https://doi.org/10.1093/ije/dyx015