DATAMIND mental health datasets#
Last modified: 30 Jan 2026
Introduction#
Datasets will be available for selection shortly.
Identifying the earliest diagnosis date, as captured in secondary data sources, is critical for designing research projects, establishing clear inclusion criteria, distinguishing incident from prevalent cases, and examining changes in disease patterns over time. To provide this support for researchers, UK LLC has undertaken an initiative to derive the earliest recorded diagnosis date for selected mental health conditions using a combination of linked NHS England datasets and predefined diagnostic codes from the DATAMIND collection in the HDR UK Phenotype Library. This includes the following conditions: Eating disorders, Anxiety, Depression, and Severe Mental Illnesses (SMIs), including Schizophrenia; Bipolar disorder and other mood-related disorders; and Other psychotic disorders and severe mental illnesses. This approach allows researchers to use derived outcomes without needing to select diagnostic or procedural codes from phenotype libraries or to manually combine multiple datasets. However, the derived outcomes are dependent on the completeness and accuracy of diagnoses recorded in the underlying datasets.
Methodology: Severe Mental Illnesses (SMIs)#
The NHSE datasets included in the SMI datasets are HESOP, HESAPC and MHSDS
Data Sources#
Anonymised data from NHS England were used to identify the earliest diagnosis dates. The datasets considered were:
The following MHSDS datasets were used for diagnosis identification:
MHS601 – Medical History (Previous Diagnosis)
MHS603 – Provisional Diagnosis
MHS604 – Primary Diagnosis
MHS605 – Secondary Diagnosis
Measures#
Participants diagnosed with the specific health conditions were identified using the disease-specific ICD-10 (4-digit) codes sourced from the DATAMIND collection in the HDR UK Phenotype Library.
Derivation of Outcomes#
The first date of diagnosis is defined as the earliest recorded inpatient admission date, appointment date, diagnosis date, or referral date within secondary care datasets. For SMI, this date was derived from MHSDS and HES based on predefined criteria and assumptions (see below), using a predefined code list for the specific SMI health outcome. Records containing specific SMI related codes were then filtered, and the algorithm selected the earliest applicable date and its corresponding data source. The resulting diagnoses were processed according to the conventions of each dataset.
MHSDS#
The MHSDS contains two date columns: Diagnosis Date and Coded Diagnosis Timestamp. Only the first diagnosed date was considered. The dates were obtained using the following steps:
If a date was present in either column, or only in the Coded Diagnosis Timestamp column, it was retained. If the date was present in the Diagnosis Date column, it was copied into the Coded Diagnosis Timestamp column, which was then considered the primary diagnosis date.
Participants with missing diagnosis dates in both date columns were removed.
The dates and diagnosis codes were filtered accordingly.
HES#
HES OP records used the appointment date, and HES APC records used the admission date, which was considered to represent the first diagnosis of SMIs. No missing dates were identified for these cases.
Dates and diagnosis codes from HES OP and HES APC were filtered.
All health outcome-specific filtered data from each dataset were then combined. Participants were grouped using cohort key (individual - level identifiers, used to uniquely identify participants from the LPS), and by diagnosis. The earliest date for each diagnosis was considered.
Participants without any diagnosis dates who were removed earlier were checked. There were two types of missing participants as listed below:
For participants whose diagnoses were initially recorded with dates but later appeared as missing for the same diagnosis, the earlier dates were prioritised.
For participants with diagnoses recorded in the database without diagnosis dates, dates were imputed using the earliest recorded diagnosis date from a previously diagnosed condition.
The dataset was then pivoted to one row per patient, and the earliest diagnosis date for each SMI category was derived from individual ICD-10 codes. The source diagnosis column indicates the dataset from which the first date of diagnosis was recorded, specifying whether the source was MHSDS, HES OP, or HES APC. For example, in cases where a participant had received diagnoses of both paranoid schizophrenia and simple schizophrenia on different dates, and these diagnoses originated from two separate data sources (HES and MHSDS, respectively), the source diagnosis column was populated with the data source linked to the earliest diagnosis date.
1. Schizophrenia#
All ICD‑10 diagnosis codes for schizophrenia, from the DATAMIND collection within the HDR UK Phenotyping Library, are included in the dataset.
Diagnosis codes |
Description |
|---|---|
F20 |
schizophrenia |
F20.0 |
paranoid schizophrenia |
F20.1 |
hebephrenic schizophrenia |
F20.2 |
catatonic schizophrenia |
F20.3 |
undifferentiated schizophrenia |
F20.4 |
post-schizophrenic depression |
F20.5 |
residual schizophrenia |
F20.6 |
simple schizophrenia |
F20.8 |
other schizophrenia |
F20.9 |
schizophrenia, unspecified |
F21 |
schizotypal disorder |
F21.X |
schizotypal disorder |
F22 |
persistent delusional disorders |
F22.0 |
delusional disorder |
F22.8 |
other persistent delusional disorders |
F22.9 |
persistent delusional disorder, unspecified |
F24 |
induced delusional disorder |
F24.X |
induced delusional disorder |
F25 |
schizoaffective disorders |
F25.0 |
schizoaffective disorder, manic type |
F25.1 |
schizoaffective disorder, depressive type |
F25.2 |
schizoaffective disorder, mixed type |
F25.8 |
other schizoaffective disorders |
F25.9 |
schizoaffective disorder, unspecified |
3. Other psychotic disorders and severe mental illnesses#
All ICD-10 diagnosis codes for other psychotic disorders and severe mental illnesses from the DATAMIND collection in the HDR UK Phenotyping Library are included in the dataset.
Diagnosis codes |
Description |
|---|---|
F23 |
acute and transient psychotic disorders |
F23.0 |
acute polymorphic psychotic disorder without symptoms of schizophrenia |
F23.1 |
acute polymorphic psychotic disorder with symptoms of schizophrenia |
F23.2 |
acute schizophrenia-like psychotic disorder |
F23.3 |
other acute predominantly delusional psychotic disorders |
F23.8 |
other acute and transient psychotic disorders |
F23.9 |
acute and transient psychotic disorder, unspecified |
F28 |
other nonorganic psychotic disorders |
F28.X |
other nonorganic psychotic disorders |
F29 |
unspecified nonorganic psychosis |
F29.X |
unspecified nonorganic psychosis |
F30 |
maniac episodes |
F30.0 |
hypomania |
F30.1 |
mania without psychotic symptoms |
F30.8 |
other maniac episodes |
F30.9 |
maniac episode, unspecified |
F32.3 |
severe depressive episode with psychotic symptoms |
F33.3 |
recurrent depressive disorder, current episode severe with psychotic symptoms |
F38 |
other mood [affective] disorders |
F38.1 |
other recurrent mood [affective] disorders |
F38.0 |
other single mood [affective] disorders |
F38.8 |
other specified mood [affective] disorders |
F39 |
unspecified mood [affective] disorder |
F39.X |
unspecified mood [affective] disorder |
Methodology: Eating disorders, Anxiety and Depression#
The NHSE datasets included in the Eating disorders, Anxiety and Depression datasets are HESOP, HESAPC and IAPT
Content will be added in due course.
References#
John, A, McGregor, J., Jones, I., Lee, S. C., Walters, J. T. R., Owen, M. J., O’Donovan, M., DelPozo-Banos, M., Berridge, D., & Lloyd, K. (2018). Premature mortality among people with severe mental illness - New evidence from linked primary care data. Schizophrenia Research, 199, 154-162.
John, A., Friedmann, Y., DelPozo-Banos, M., Frizzati, A., Ford, T., & Thapar, A. (2022). Association of school absence and exclusion with recorded neurodevelopmental disorders, mental disorders, or self-harm: a nationwide, retrospective, electronic cohort study of children and young people in Wales, UK. The Lancet Psychiatry, 9(1), 23-34.
First Occurrence of Health Outcomes defined by 3-character ICD10 code, report 2019, UK Biobank.
Algorithmically Defined Outcomes (ADOs), report 2022, UK Biobank.