DATAMIND mental health datasets#

Last modified: 30 Jan 2026

UK LLC has worked with DATAMIND to create a number of derived mental health datasets.

Introduction#

Datasets will be available for selection shortly.

Identifying the earliest diagnosis date, as captured in secondary data sources, is critical for designing research projects, establishing clear inclusion criteria, distinguishing incident from prevalent cases, and examining changes in disease patterns over time. To provide this support for researchers, UK LLC has undertaken an initiative to derive the earliest recorded diagnosis date for selected mental health conditions using a combination of linked NHS England datasets and predefined diagnostic codes from the DATAMIND collection in the HDR UK Phenotype Library. This includes the following conditions: Eating disorders, Anxiety, Depression, and Severe Mental Illnesses (SMIs), including Schizophrenia; Bipolar disorder and other mood-related disorders; and Other psychotic disorders and severe mental illnesses. This approach allows researchers to use derived outcomes without needing to select diagnostic or procedural codes from phenotype libraries or to manually combine multiple datasets. However, the derived outcomes are dependent on the completeness and accuracy of diagnoses recorded in the underlying datasets.

Methodology: Severe Mental Illnesses (SMIs)#

The NHSE datasets included in the SMI datasets are HESOP, HESAPC and MHSDS

Data Sources#

Anonymised data from NHS England were used to identify the earliest diagnosis dates. The datasets considered were:

  1. Mental Health Services Data Set (MHSDS).

  2. Hospital Episode Statistics – Outpatients (HES OP).

  3. Hospital Episode Statistics – Admitted Patient Care (HES APC).

The following MHSDS datasets were used for diagnosis identification:

  • MHS601 – Medical History (Previous Diagnosis)

  • MHS603 – Provisional Diagnosis

  • MHS604 – Primary Diagnosis

  • MHS605 – Secondary Diagnosis

Measures#

Participants diagnosed with the specific health conditions were identified using the disease-specific ICD-10 (4-digit) codes sourced from the DATAMIND collection in the HDR UK Phenotype Library.

Derivation of Outcomes#

The first date of diagnosis is defined as the earliest recorded inpatient admission date, appointment date, diagnosis date, or referral date within secondary care datasets. For SMI, this date was derived from MHSDS and HES based on predefined criteria and assumptions (see below), using a predefined code list for the specific SMI health outcome. Records containing specific SMI related codes were then filtered, and the algorithm selected the earliest applicable date and its corresponding data source. The resulting diagnoses were processed according to the conventions of each dataset.

MHSDS#

The MHSDS contains two date columns: Diagnosis Date and Coded Diagnosis Timestamp. Only the first diagnosed date was considered. The dates were obtained using the following steps:

  1. If a date was present in either column, or only in the Coded Diagnosis Timestamp column, it was retained. If the date was present in the Diagnosis Date column, it was copied into the Coded Diagnosis Timestamp column, which was then considered the primary diagnosis date.

  2. Participants with missing diagnosis dates in both date columns were removed.

  3. The dates and diagnosis codes were filtered accordingly.

HES#

  1. HES OP records used the appointment date, and HES APC records used the admission date, which was considered to represent the first diagnosis of SMIs. No missing dates were identified for these cases.

  2. Dates and diagnosis codes from HES OP and HES APC were filtered.

All health outcome-specific filtered data from each dataset were then combined. Participants were grouped using cohort key (individual - level identifiers, used to uniquely identify participants from the LPS), and by diagnosis. The earliest date for each diagnosis was considered.

Participants without any diagnosis dates who were removed earlier were checked. There were two types of missing participants as listed below:

  1. For participants whose diagnoses were initially recorded with dates but later appeared as missing for the same diagnosis, the earlier dates were prioritised.

  2. For participants with diagnoses recorded in the database without diagnosis dates, dates were imputed using the earliest recorded diagnosis date from a previously diagnosed condition.

The dataset was then pivoted to one row per patient, and the earliest diagnosis date for each SMI category was derived from individual ICD-10 codes. The source diagnosis column indicates the dataset from which the first date of diagnosis was recorded, specifying whether the source was MHSDS, HES OP, or HES APC. For example, in cases where a participant had received diagnoses of both paranoid schizophrenia and simple schizophrenia on different dates, and these diagnoses originated from two separate data sources (HES and MHSDS, respectively), the source diagnosis column was populated with the data source linked to the earliest diagnosis date.

1. Schizophrenia#

All ICD‑10 diagnosis codes for schizophrenia, from the DATAMIND collection within the HDR UK Phenotyping Library, are included in the dataset.

Diagnosis codes

Description

F20

schizophrenia

F20.0

paranoid schizophrenia

F20.1

hebephrenic schizophrenia

F20.2

catatonic schizophrenia

F20.3

undifferentiated schizophrenia

F20.4

post-schizophrenic depression

F20.5

residual schizophrenia

F20.6

simple schizophrenia

F20.8

other schizophrenia

F20.9

schizophrenia, unspecified

F21

schizotypal disorder

F21.X

schizotypal disorder

F22

persistent delusional disorders

F22.0

delusional disorder

F22.8

other persistent delusional disorders

F22.9

persistent delusional disorder, unspecified

F24

induced delusional disorder

F24.X

induced delusional disorder

F25

schizoaffective disorders

F25.0

schizoaffective disorder, manic type

F25.1

schizoaffective disorder, depressive type

F25.2

schizoaffective disorder, mixed type

F25.8

other schizoaffective disorders

F25.9

schizoaffective disorder, unspecified

3. Other psychotic disorders and severe mental illnesses#

All ICD-10 diagnosis codes for other psychotic disorders and severe mental illnesses from the DATAMIND collection in the HDR UK Phenotyping Library are included in the dataset.

Diagnosis codes

Description

F23

acute and transient psychotic disorders

F23.0

acute polymorphic psychotic disorder without symptoms of schizophrenia

F23.1

acute polymorphic psychotic disorder with symptoms of schizophrenia

F23.2

acute schizophrenia-like psychotic disorder

F23.3

other acute predominantly delusional psychotic disorders

F23.8

other acute and transient psychotic disorders

F23.9

acute and transient psychotic disorder, unspecified

F28

other nonorganic psychotic disorders

F28.X

other nonorganic psychotic disorders

F29

unspecified nonorganic psychosis

F29.X

unspecified nonorganic psychosis

F30

maniac episodes

F30.0

hypomania

F30.1

mania without psychotic symptoms

F30.8

other maniac episodes

F30.9

maniac episode, unspecified

F32.3

severe depressive episode with psychotic symptoms

F33.3

recurrent depressive disorder, current episode severe with psychotic symptoms

F38

other mood [affective] disorders

F38.1

other recurrent mood [affective] disorders

F38.0

other single mood [affective] disorders

F38.8

other specified mood [affective] disorders

F39

unspecified mood [affective] disorder

F39.X

unspecified mood [affective] disorder

Methodology: Eating disorders, Anxiety and Depression#

The NHSE datasets included in the Eating disorders, Anxiety and Depression datasets are HESOP, HESAPC and IAPT
Content will be added in due course.

References#

  1. John, A, McGregor, J., Jones, I., Lee, S. C., Walters, J. T. R., Owen, M. J., O’Donovan, M., DelPozo-Banos, M., Berridge, D., & Lloyd, K. (2018). Premature mortality among people with severe mental illness - New evidence from linked primary care data. Schizophrenia Research, 199, 154-162.

  2. John, A., Friedmann, Y., DelPozo-Banos, M., Frizzati, A., Ford, T., & Thapar, A. (2022). Association of school absence and exclusion with recorded neurodevelopmental disorders, mental disorders, or self-harm: a nationwide, retrospective, electronic cohort study of children and young people in Wales, UK. The Lancet Psychiatry, 9(1), 23-34.

  3. First Occurrence of Health Outcomes defined by 3-character ICD10 code, report 2019, UK Biobank.

  4. Algorithmically Defined Outcomes (ADOs), report 2022, UK Biobank.