Guidance

Ministry of Justice: Data First

Data First is a pioneering data-linking, research and academic engagement programme led by the Ministry of Justice and funded by ADR UK.

Data First unlocks the potential of the wealth of data created by the Ministry of Justice (MOJ) by making linked administrative datasets from across the justice system available for research. The programme is led by MOJ and funded by Administrative Data Research UK (ADR UK), an investment by the Economic and Social Research Council (ESRC).

Data from the courts, prison and probation services in England and Wales have been linked to enable new and innovative analysis of user journeys, interactions, and outcomes across the justice system. The programme is also enhancing the linking of justice data with other government departments, including education data from the Department for Education’s (DfE) National Pupil Database (NPD).

Data First enables researchers across government and academia to access these datasets in an ethical and responsible way via secure platforms in the ONS Secure Research Service and SAIL Databank

By working in partnership with academic experts to facilitate and promote research in line with evidence priorities set out in the MOJ Areas of Research Interest (ARI) Data First is generating new insights to inform the development of government policy and drive real progress in improving justice outcomes.

General programme information

The Data First user guide provides further information about the programme, including the processes for accessing the data for research. The privacy and data protection statement provides information about how we use and share data.

Datasets

Data catalogues are available for all Data First datasets, providing information on the variables contained within each. These data catalogues are currently draft versions that provide basic details of each dataset and will be updated soon with final versions.

Data First has shared six datasets from administrative sources across the courts, prison and probation services in England and Wales: magistrates’ courts, the Crown Court, prisoner custodial journeys, probation services, and the family and civil courts.

The cross-justice system linking dataset can be used to join these six different datasets at a person level. This linking dataset also contains a table which can be used to join magistrates’ courts and Crown Court data at a case level.

Separately, data on criminal histories from the Police National Computer (PNC) have been linked to education and social care data in England from the DfE NPD as part of the MOJ-DfE data share. Please contact DataLinkingTeam@justice.gov.uk or data.sharing@education.gov.uk for the latest available metadata for the MOJ-DfE data share.

MOJ cross-justice system datasets

Limitations of data linking

Users should note that the accuracy of data linking is determined by the availability of personal identifying information in source data. All available identifiers have been used during the matching process, but the availability of demographic information varies by dataset. The Data First criminal justice datasets (magistrates’ courts, Crown Court, prisoner custodial journey and probation) contain numerous, well-populated identifiers. The family and civil court datasets however are less well-populated. This impacts on how linkage occurs and will need to be considered as part of designing research projects. Further details can be found within the user guide and relevant data catalogues, and researchers are welcome to contact us to discuss project ideas at their earliest convenience so we can advise on viability.

Applying for data access  

Data First datasets can be accessed through the ONS Secure Research Service (SRS) or SAIL Databank (except for the MOJ-DfE data share, which is only available through the ONS SRS).

Requests to access data through the ONS SRS require completion of the Secure Access to Data Form, which can be accessed here along with additional supporting documents.

Guidance for completing the application form can be found in the Data Sharing Guidance, and the list of datasets and access routes can be found here. Further information on the process overall is included within the Data First user guide above.

To access data within the SAIL Databank please apply though SAIL.

A register of external research projects which have been approved to use MOJ data is available to view here.

Analytical outputs

Statistical and social research publications using Data First data have been delivered by MOJ analysts or in collaboration with other government departments. Outputs have also been produced by ADR UK-funded Research Fellows.  These publications can be found below:

Splink: Data linkage at scale

Through Data First, MOJ has developed a free and open-source software library to enable data linkage at scale. This software has been used to link some of the largest datasets held by MOJ as part of Data First.

Splink is a freely available, open-source Python package that is:

  • faster and more accurate than other free tools
  • able to link large datasets, of tens of millions of records or more
  • developed with advice from academic experts in data linkage
  • able to produce a wide range of interactive data visualisations that help to build effective models, explain linkage predictions, diagnose problems and quality assure models
  • compatible with multiple databases and big data processing engines, meaning it can run on a wider range of computer systems

You can find out more on the Splink website, where you can download and start using Splink. You can also ask us a question or raise an issue on the public GitHub repository. Splink are happy to hear from researchers interested in using the software for their work.

Awards and Recognition

Contact

Contact the Data First team at datafirst@justice.gov.uk if you would like further information or have any queries.

Published 30 June 2020
Last updated 1028 June 2024 + show all updates
  1. An additional section, 'Limitations of data linking', has been added to the main text.

  2. Updated magistrates' courts data catalogue added. Outdated criminal courts and prisons linking data catalogue removed.

  3. Updating data catalogues for magistrates' courts, Crown Court, prisoner custodial journey and probation datasets.

  4. General user information has been updated to reflect new datasets and linkages. Updates to the User Guide and data catalogues will follow. The order of sections of the document has changed. New contact information has been added.

  5. Splink information added.

  6. Data First Family Court data catalogue updated.

  7. Data First prisoner custodial journey data catalogue updated.

  8. Analytical outputs section added.

  9. User guide updated and Data First probation data catalogue, Data First criminal courts, prisons and probation linking data catalogue published.

  10. User guide updated and Data First Family Court data catalogue published.

  11. User guide, privacy statement, Data First magistrates' court defendant data catalogue, Data First Crown Court defendant data catalogue and Data First criminal courts and prisons linking data catalogue updated.

  12. User guide updated and Data First prisoner custodial journey data catalogue published.

  13. User guide updated and Data First linked magistrates’ and Crown Court data catalogue published.

  14. Documents updated and Data First Crown Court defendant data catalogue published.

  15. First published.