Skip to main content

Together we are beating cancer

Donate now
  • Policy & Insight

How Secure Data Environments can help drive advances in health data research

Ben Jones
by Ben Jones | In depth

8 August 2024

0 comments 0 comments

A transparent microplate, made up of small test tubes, over a screen showing a DNA profile.
Westend61 / Andrew Brookes

NHS England (NHSE) is changing the way that researchers access health data in England by moving to a network of Secure Data Environments (SDEs).

We are broadly supportive of the direction of the changes, but for the SDE Network to deliver all the benefits it promises, the new UK Government must guarantee sufficient and sustained funding beyond 2025 in its next spending review. This funding needs to be commensurate with the size of the challenge and the opportunity presented by NHS health data. As such, it must allow for meaningful and continuous patient and public involvement in the creation and governance of the SDE network, ensuring that patients’ data is used with their consent and endorsement.

With that level of support and investment, the SDE Network could greatly improve how researchers access and use health data, which could have profound impacts on public health for years to come.

Secure access to health data is driving significant advances in the ways that we prevent, diagnose and treat cancer. The NHS Research SDE Network has the potential to make those advances happen faster, so we're looking forward to continuing to work with it to deliver research powered by NHS data.

Now, to safeguard the SDE Network's success, the new Government must commit to funding it beyond 2025 - ensuring that our research can help more people live longer, better lives.

- Dr Ian Walker, our executive director of policy

Using health data to create better health

In our Longer, Better Lives programme for government, we laid out how government can unlock the potential of data as a driver for change within research and health. The opportunities presented by the depth of UK health data are almost unparalleled. This is because the UK has very detailed nationwide, life-long health datasets – datasets that researchers can harness to improve our ability to prevent, detect, diagnose, and treat cancer.

However, there are currently barriers preventing researchers from using this data to its full life saving potential.

Historically, the fragmented nature of NHS systems has made it difficult for researchers to access NHS health data. This has delayed or even stopped research, as well as raising some security concerns. Encouragingly, following on from the Data Saves Lives strategy, NHSE is now looking to change this through its Data for R&D Programme. That will involve moving from a “data-dissemination” model, where most data is sent to researchers who request it, to an SDE-based “data-access” model, where researchers request access to data that always remains within NHS systems. 

We believe that this is an exciting shift. If the new network is adequately financed and executed correctly, with patients and the public at the centre of its creation and governance, it has the potential to help researchers access health data more efficiently while providing greater safety and security.

How does data save lives?  

High quality health data is critical for multiple strands of traditional research, including clinical trial recruitment and health system examination. Previously, the cancer research we focus on mostly involved relatively small-scale studies of essential pathways and genes. But now, the rise of modern computing and the proliferation of large-scale detailed data has also put data-driven research at the heart of lots of the most cutting-edge cancer science. For instance, for over 10 years, Professor Rebecca Fitzgerald at the University of Cambridge has been spearheading a huge project that brings together scientists, doctors and nurses from across the UK with the goal of better understanding oesophageal cancer. This impressive programme of work has seen researchers collecting tumour and blood samples from people with oesophageal cancer and decoding the cancer’s genetic sequence so that they have a complete map of the cancer’s DNA.

We need a lot of data. We can’t find the needle in the haystack by just using data from a few people.

- Rebecca Fitzgerald, OBE, Professor of Cancer Prevention, University of Cambridge

Our Research Data Strategy reflects just how important and impactful health data has been for cancer research. It has enabled us to develop our data science community while ensuring the trust and involvement of patients and the public. We have also recently launched the Cancer Data Collaborative, a forum bringing NHSE, Cancer Research UK and other charities together with patient representatives to tackle the biggest challenges in cancer data.  

So, how is health data currently stored and accessed in England?

At present, research and analysis using health data in England predominantly happens according to the data dissemination model. This involves the NHS transferring de-identified data to third parties via data sharing agreements. Researchers submit applications to NHS data controllers, who assess both the application and the intended data usage. Upon approval, the data is transmitted directly to the researchers, who can then analyse the data with any tools they have at their disposal.

Researchers are legally bound to use the data the NHS provides them in the ways specified in their initial application, but this can be difficult to monitor under the current system. For example, researchers are required to delete the data themselves once their access rights elapse, but the NHS doesn‘t have an easy way to verify whether this has happened. There are also other issues, such as the creation of many copies of datasets, which may be error prone, and require significant amounts of computer memory.

Currently, more than half my time is spent trying to get data in the first place.

- Dr Colin Mclean, senior research fellow in data science and health economics, Edinburgh University

Data dissemination essentially means that the NHS acts as a lending library. Researchers come to NHS data controllers, are verified, and are then sent health data to use outside of the NHS’s ecosystem. However, because of the size and complexity of the NHS, there is not just one point of contact. Instead, estimates suggest there are around 7,000. This means researchers need to spend a lot of time and effort to get the information they need. Imagine if reading a book involved having to apply to multiple different libraries, each with their own complex lending policies, one chapter at a time. That’s often how gathering health data for research works in England today. Simplifying access routes is necessary to make sure researchers can focus on research.

What is the new system?  

NHSE is currently setting up a new network of SDEs as part of the Data for R&D Programme, with £175m initially allocated for this and another programme called NHS DigiTrials. The SDE Network will be formed of two parts: one large SDE covering the whole of England, containing high level national datasets, and 11 smaller subnational ones covering areas like London or the north-east, containing more detailed datasets. This should reduce the 7,000 points of data access to 12.

If the new network is properly implemented, with more effective methods of data application and granting processes, researchers will be able to obtain the data needed for their research quicker and more efficiently. To ensure this happens, the Data for R&D Programme must deliver a single front door researchers can use to access the different SDEs.

But what is a Secure Data Environment?  

An SDE is a protected space for sensitive data that can only be accessed by authorised researchers remotely. This approach ensures that patient data remains confined within the environment: while users can extract results such as tables or graphs, the raw data itself can never leave the host system. This setup grants custodians of the data more control over its usage and increases safeguards preventing misuse of data. It also allows for improvements to the data to be more easily implemented, as everyone is working from the same dataset and not multiple copies.

‘Data access by default’?

The SDE approach is sometimes referred to as ‘data access by default’. Some media reports have implied that this means that anyone will be able to access health data on command. This is not true. Data access by default refers to the fact that the default model is based on applying to access relevant data on NHS systems rather than applying to export it elsewhere.

A simple way to think about this is to picture an SDE as a reference library for health data, rather than a lending library. After appropriate checks have been made, researchers can (remotely) enter the library to read and analyse specific data, but they can’t take it away with them. Instead, they’re only able to export the results they get from using the data. In short, the SDE approach means data remains within NHS systems, making it fundamentally more secure. This also protects the quality of the data by ensuring it remains in a single dataset, which is easier to maintain and manage.

What are the potential benefits of the new system

The main potential benefits are:   

  1. Efficiency – Properly implemented, a reduction in access points should make it easier and quicker for researchers to access health data.
  2. Security Because data remains on NHS systems, it is easier to monitor what researchers are doing with it.

These benefits are reflected in the fact that there is great deal of support for SDE-like systems, often called Trusted Research Environments (TREs), across the sector and in UK research communities. Examples of TREs include Scotland’s Data Safe Haven Programme, the SAIL databank in Wales and Genomics England’s Research Environment.

Does Cancer Research UK have an SDE-like platform?

Yes! Our Trusted Research Programme provides a Trusted Research Environment (TRE), an SDE-like system, alongside advisory services to support researchers dealing with sensitive health and related data.

Our TRE also provides access to high-performance computing facilities for advanced analytics, machine learning, and AI development for researchers across the UK who don’t have a suitable and safe set up to store and analyse patient data. Our Trusted Research Programme is currently onboarding two pilot projects and it is helping maximise the safe and effective collection and re-use of cancer related data.

Our ambition is to develop a portfolio of research projects and data that can be used to advance research across multiple different areas and help us better understand cancer. We want to harness emerging technologies, including AI, to achieve our cancer-beating goals and our TRE will help us do this.

If you are a researcher and think that your research would benefit from using our TRE, please contact [email protected].

Sounds good! But what’s needed for the NHS Research SDE Network to be a success?

There are a few things that NHSE and the Data for R&D Programme need to do to take full advantage of this opportunity to improve data access and data security. We’re currently developing a policy paper that will outline some of the specifics, including points on pricing, avoiding further fragmentation, and ensuring patients and the public can be meaningfully involved in the creation and governance of the network.

However, most importantly, the new UK Government needs to use its next spending review to ensure that the central SDE and all subnational SDEs are on a sustainable footing beyond 2025. This includes providing the Data for R&D Programme with the resources it needs to make sure that access to data is more timely and less complicated than it is today, and that it can be delivered at a price that doesn’t stop researchers from carrying out their vital work. The SDE Network also needs resources and support to meaningfully involve patients and the public.

The guaranteed funding must be commensurate with the size of this challenge. The SDE Network has the potential to improve data access for a generation, but only if it can secure sustained, significant long-term support and investment.

Ben Jones

Ben Jones

Ben is a policy advisor in our policy development team.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.