{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "beb97ca4",
   "metadata": {},
   "source": [
    "# LPS Harmonised Demographic Dataset (reduced)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e3b8790c",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       ">Last modified: 27 Oct 2025"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import sys\n",
    "import os\n",
    "sys.path.append(os.path.abspath('../../../../scripts/'))\n",
    "from data_doc_helper import UKLLCDataSet as DS, last_modified\n",
    "API_KEY = os.environ['FASTAPI_KEY']\n",
    "ds = DS(\"rtn_lps_sociodemo_harmonised_reduced\")\n",
    "last_modified()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31920e47",
   "metadata": {},
   "source": [
    "<div style=\"background-color: rgba(0, 178, 169, 0.3); padding: 5px; border-radius: 5px;\"><strong>UK LLC has created a harmonised dataset of key demographic variables across the partner LPS.</strong></div>  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "367be529",
   "metadata": {},
   "source": [
    "<div style=\"background-color: rgb(229, 106, 84, 0.3); padding: 5px; border-radius: 5px;\"><strong>More information about this dataset is available <a href=\"LPS_derived.html#harmonisation-methodology\" target=\"_blank\">here.</a></strong></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe073008",
   "metadata": {},
   "source": [
    "## 1. Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c938a7b1",
   "metadata": {},
   "source": [
    "The reduced LPS harmonised demographic dataset contains harmonised variables for **sex, gender, year of birth** and **ethnic group**. This dataset retains only the **most recent response** provided by a participant for each variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fb0becc7",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_047ee th {\n",
       "  text-align: left;\n",
       "}\n",
       "#T_047ee_row0_col0, #T_047ee_row0_col1, #T_047ee_row1_col0, #T_047ee_row1_col1, #T_047ee_row2_col0, #T_047ee_row2_col1, #T_047ee_row3_col0, #T_047ee_row3_col1, #T_047ee_row4_col0, #T_047ee_row4_col1, #T_047ee_row5_col0, #T_047ee_row5_col1, #T_047ee_row6_col0, #T_047ee_row6_col1, #T_047ee_row7_col0, #T_047ee_row7_col1, #T_047ee_row8_col0, #T_047ee_row8_col1, #T_047ee_row9_col0, #T_047ee_row9_col1, #T_047ee_row10_col0, #T_047ee_row10_col1, #T_047ee_row11_col0, #T_047ee_row11_col1 {\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_047ee\" style=\"font-size: 14px\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_047ee_level0_col0\" class=\"col_heading level0 col0\" >Dataset Descriptor</th>\n",
       "      <th id=\"T_047ee_level0_col1\" class=\"col_heading level0 col1\" >Dataset-specific Information</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row0_col0\" class=\"data row0 col0\" >Name of Dataset in TRE</td>\n",
       "      <td id=\"T_047ee_row0_col1\" class=\"data row0 col1\" >UKLLC_rtn_lps_sociodemo_harmonised_reduced</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row1_col0\" class=\"data row1 col0\" >Citation (APA)</td>\n",
       "      <td id=\"T_047ee_row1_col1\" class=\"data row1 col1\" >UK Longitudinal Linkage Collaboration. (2025). <i>UK LLC Managed: LPS Harmonised Demographic Dataset (reduced).</i> UK Longitudinal Linkage Collaboration (UK LLC).  <a href=\"https://doi.org/10.71760/ukllc-dataset-00437-01\" rel=\"noopener noreferrer\" target=\"_blank\">https://doi.org/10.71760/ukllc-dataset-00437-01</a></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row2_col0\" class=\"data row2 col0\" >Download Citation</td>\n",
       "      <td id=\"T_047ee_row2_col1\" class=\"data row2 col1\" > <a href=\"https://api.datacite.org/application/vnd.citationstyles.csl+json/10.71760/ukllc-dataset-00437-01\" rel=\"noopener noreferrer\" target=\"_blank\">Citeproc JSON</a>&nbsp;&nbsp;&nbsp;&nbsp; <a href=\"https://api.datacite.org/application/x-bibtex/10.71760/ukllc-dataset-00437-01\" rel=\"noopener noreferrer\" target=\"_blank\">BibTeX</a>&nbsp;&nbsp;&nbsp;&nbsp; <a href=\"https://api.datacite.org/application/x-research-info-systems/10.71760/ukllc-dataset-00437-01\" rel=\"noopener noreferrer\" target=\"_blank\">RIS</a></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row3_col0\" class=\"data row3 col0\" >Series</td>\n",
       "      <td id=\"T_047ee_row3_col1\" class=\"data row3 col1\" > <a href=\"https://guidebook.ukllc.ac.uk/docs/ukllc_managed_data/ukllc_data\">UK LLC Managed</a></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row4_col0\" class=\"data row4 col0\" >Owner</td>\n",
       "      <td id=\"T_047ee_row4_col1\" class=\"data row4 col1\" >UK Longitudinal Linkage Collaboration</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row5_col0\" class=\"data row5 col0\" >Temporal Coverage</td>\n",
       "      <td id=\"T_047ee_row5_col1\" class=\"data row5 col1\" >Unknown - Unknown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row6_col0\" class=\"data row6 col0\" >Keywords</td>\n",
       "      <td id=\"T_047ee_row6_col1\" class=\"data row6 col1\" >harmonised sociodemographic ethnicity age sex</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row7_col0\" class=\"data row7 col0\" >Participant Count</td>\n",
       "      <td id=\"T_047ee_row7_col1\" class=\"data row7 col1\" >331675</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row8_col0\" class=\"data row8 col0\" >Number of variables</td>\n",
       "      <td id=\"T_047ee_row8_col1\" class=\"data row8 col1\" >8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row9_col0\" class=\"data row9 col0\" >Number of observations</td>\n",
       "      <td id=\"T_047ee_row9_col1\" class=\"data row9 col1\" >1426986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row10_col0\" class=\"data row10 col0\" >Specific Restrictions to Data Use</td>\n",
       "      <td id=\"T_047ee_row10_col1\" class=\"data row10 col1\" >None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_047ee_row11_col0\" class=\"data row11 col0\" >Build a Data Request</td>\n",
       "      <td id=\"T_047ee_row11_col1\" class=\"data row11 col1\" > <a href=\"https://explore.ukllc.ac.uk/\" rel=\"noopener noreferrer\" target=\"_blank\">https://explore.ukllc.ac.uk/</a></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x16adc61acf0>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.info_table()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "799ad0f1",
   "metadata": {},
   "source": [
    "## 2. Variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff885b97",
   "metadata": {},
   "source": [
    "| Variable name | Variable description |  \n",
    "|---|---|\n",
    "| LLC_xxxx_stud_id | Individual identifier (unique to each project in the TRE) |\n",
    "| cohort | LPS name |\n",
    "| source | LPS dataset holding the original demographic variable(s) for each participant (e.g. ALSPAC_wave1y) |\n",
    "| object | Label indicating which of the harmonised variables is represented by the value (e.g. llc_sex, llc_gender) |\n",
    "| value | Numeric value for each of the objects |\n",
    "| label | Description of what each of the values represents |  \n",
    "| llc_timestamp | Date (month and year) on which the information was provided by the participant to the LPS |  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a469c49",
   "metadata": {},
   "source": [
    "## 3. Version History"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "41fd9910",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_c699a th {\n",
       "  text-align: left;\n",
       "}\n",
       "#T_c699a_row0_col0, #T_c699a_row0_col1, #T_c699a_row1_col0, #T_c699a_row1_col1, #T_c699a_row2_col0, #T_c699a_row2_col1, #T_c699a_row3_col0, #T_c699a_row3_col1, #T_c699a_row4_col0, #T_c699a_row4_col1 {\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_c699a\" style=\"font-size: 14px\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_c699a_level0_col0\" class=\"col_heading level0 col0\" >Version</th>\n",
       "      <th id=\"T_c699a_level0_col1\" class=\"col_heading level0 col1\" >1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_c699a_row0_col0\" class=\"data row0 col0\" >Version Date</td>\n",
       "      <td id=\"T_c699a_row0_col1\" class=\"data row0 col1\" >04 Jun 2025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_c699a_row1_col0\" class=\"data row1 col0\" >Number of Variables</td>\n",
       "      <td id=\"T_c699a_row1_col1\" class=\"data row1 col1\" >8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_c699a_row2_col0\" class=\"data row2 col0\" >Number of Observations</td>\n",
       "      <td id=\"T_c699a_row2_col1\" class=\"data row2 col1\" >1426986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_c699a_row3_col0\" class=\"data row3 col0\" >DOI</td>\n",
       "      <td id=\"T_c699a_row3_col1\" class=\"data row3 col1\" > <a href=\"https://doi.org/10.71760/ukllc-dataset-00437-01\" rel=\"noopener noreferrer\" target=\"_blank\">10.71760/ukllc-dataset-00437-01</a></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_c699a_row4_col0\" class=\"data row4 col0\" >Change Log</td>\n",
       "      <td id=\"T_c699a_row4_col1\" class=\"data row4 col1\" > <a href=\"https://api.datacite.org/dois/10.71760/ukllc-dataset-00437-01/activities\" rel=\"noopener noreferrer\" target=\"_blank\">10.71760/ukllc-dataset-00437-01/activities</a></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x16adc527d90>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.version_history()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bedf5551",
   "metadata": {},
   "source": [
    "## 4. Useful Syntax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "29e9d5bb",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Below we will include syntax that may be helpful to other researchers in the UK LLC TRE. For longer scripts, we will include a snippet of the code plus a link to Git where you can find the full scripts."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "ds.useful_syntax()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "jupbook",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}