Skip to main content

Together we are beating cancer

Donate now
  • For Researchers

Watching the detectives – meet the experts chasing down cancer risk

The Cancer Research UK logo
by Cancer Research UK | In depth

22 January 2025

4 comments 4 comments

Health data

A world where no one gets a late cancer diagnosis – it’s quite the thought. But how do we get there? Understanding cancer risk is key, and the Cancer Data Driven Detection – or CD3 – project aims to do so in an unprecedented fashion. Here, a range of experts tell us what they’ll be bringing to the table and their hopes for the project…

Early detection of cancer saves lives, but currently around half of all cancers are detected at the later stages of disease, when treatment choices are limited and outcomes are frighteningly bad.

In 2020, Cancer Research UK set ourselves the challenge of predicting what a future world would look like where no one gets a late cancer diagnosis. How would that ideal situation be delivered and what would need to change to get us there?

Risk of cancer varies greatly between people; if we knew their risk, we could adapt cancer testing and prevention to each person’s needs, making sure those at highest risk benefit most.

To do this, we convened every major stakeholder organisation and key individuals across the entire ecosystem which would be needed to deliver early detection for all. We worked with researchers, industry, the NHS, research funders, investors, regulators, government and, crucially, patients and the public.

One of the main recommendations from this consultation was cancer risk. Risk of cancer varies greatly between people; if we knew their risk, we could adapt cancer testing and prevention to each person’s needs, making sure those at highest risk benefit most.

So CRUK took action – we brought together a consortium of funders, research institutes, scientists, patients and the public to work on this huge challenge, and this became the CD3 initiative.

CD3 has the potential to integrate and use these vast amounts of data now available, coupled with today’s cutting-edge data analytic methods, to utterly transform the way we think about cancer risk and detection.

There are many different types of data about each of us, each of which can carry signals informing our risk of cancer. Our genes influence our risk of getting cancer, as does our family history, our heritage, where we live, what we have eaten or been exposed to, our socioeconomic status, our digital footprint. And, of course, we all know that the computational power to analyse and make predictions from data has exploded over the last few years.

CD3 has the potential to integrate and use these vast amounts of data now available, coupled with today’s cutting-edge data analytic methods, to utterly transform the way we think about cancer risk and detection. We are working towards a future where computational models could inform the use of cancer screening methods, for example recommending mammography breast screening at younger ages or more frequently in those women at highest risk. Or AI-enabled decision support software for GPs, advising on appropriate cancer testing for patients reporting very early symptoms.

If we get this right, the impact on lives saved and improved will be vast. We know that early detection saves lives – for example, a patient diagnosed today with stage IV (advanced) colorectal cancer has a less than 10% chance of surviving for ten years or more. However, a patient whose cancer is detected at stage I (small and localised) has a greater than 90% chance of survival. The difference is that stark. If we know who is at higher risk and then target testing early and effectively in those at higher risk, we will transform cancer outcomes in the UK, and worldwide.

CRUK are thrilled to have convened and to be supporting the CD3 initiative alongside our partners NIHR, EPSRC, HDR-UK and ADR-UK. We can’t wait to see the outputs!

Dr David Crosby, Head of Prevention and Early Detection Research, Cancer Research UK

The AI expert

Owen Rackham

“Even the most sophisticated model is of little use if its insights cannot be applied in clinical settings.”

Owen Rackham, Associate Professor of Systems biology at the University of Southampton

In the fight against cancer, the scale of the challenge can often feel insurmountable. As data grows in complexity and volume, so too does the need for innovative approaches that can translate it into actionable insights. This is precisely where the CD3 project steps in, and I am excited to lead its Advanced Analytics and Multifactorial Modelling workstream.

As a computer scientist by training, my research group, based at the University of Southampton, has long focused on combining machine learning and multimodal data to predict cellular behaviours. The messy, large-scale datasets we work with – omic data in particular – demand ingenuity to uncover meaningful patterns.

CD3 offers a unique opportunity to extend these tools to broader data types, from clinical records to administrative data, focusing on how their combination can improve cancer risk prediction. A cornerstone of this work is the use of artificial intelligence.

Even the most sophisticated model is of little use if its insights cannot be applied in clinical settings.

Machine learning and statistical techniques will ensure that our models are robust, equitable, and clinically useful for diverse populations. This involves blending established statistical methods with emerging AI concepts to make tools that will significantly impact patients. After all, even the most sophisticated model is of little use if its insights cannot be applied in clinical settings.

Balancing these demands is a challenge, but I believe that interdisciplinary collaboration within CD3 will allow us to push the boundaries of what’s possible in cancer risk prediction. The team behind CD3 has expertise ranging from patient and public involvement to general practice, from surgery to medical regulation and biostatistics to advanced machine learning. This range exemplifies the collaborative spirit needed to face this challenge. Together, we aim to not only address gaps in cancer risk prediction but also set a new standard for how large-scale disease risk stratification is approached.

Cancer remains one of humanity’s greatest challenges, but projects like CD3 remind us that, together, we can push the boundaries of what’s possible.

But the road ahead is not without its challenges. Chief among them is the sheer complexity of integrating the data, expertise, and infrastructure needed to change how cancer risk is predicted. The success of CD3 hinges on our ability to overcome these obstacles, building tools that are not only powerful but also clinically relevant. CD3’s potential extends far beyond its immediate goals. If successful, it could serve as a template for tackling similar challenges in the era of big data and AI, addressing both human and technological hurdles. A key part of this aim is sharing our findings and tools widely, ensuring the project’s impact resonates across the research community.

Cancer remains one of humanity’s greatest challenges, but projects like CD3 remind us that, together, we can push the boundaries of what’s possible. By uniting diverse expertise, embracing innovative technologies, and keeping our eyes on the ultimate goal – improving lives – I strongly believe that we have the opportunity to chart a new course in cancer research. I, for one, am eager to get started.

The data scientist

angela_wood

“A major challenge is handling missing or incomplete data where patient information may be inconsistent.”

Angela Wood, Professor of Health Data Science University of Cambridge

I’ll be focussed on developing advanced analytical tools for building multifactorial cancer risk models to uncover hidden cancer risk factors and build clinically valid, equitable risk prediction tools.

One of our core objectives is to develop and validate risk prediction models that integrate data from diverse sources, such as electronic health records (EHR), administrative records, and other multimodal datasets. This data will include genetic profiles, environmental exposures, medication histories and lifestyle factors, recorded at various time points in an individual’s life. By integrating these data, we aim to provide a more comprehensive understanding of cancer risk.

However, we face challenges. Missing data, biases in the data itself, and ensuring models are accurate, equitable, adaptable and transferable must be tackled head-on. To do so, our team will employ techniques from artificial intelligence, machine learning, biostatistics, and epidemiology to develop robust, scientifically valid models that can be used to guide clinical decision-making in diverse patient populations.

A major challenge is handling missing or incomplete data where patient information may be inconsistent.

A major challenge is handling missing or incomplete data, particularly in EHRs, where patient information can be inconsistent due to differences in healthcare practices or be influenced by patient characteristics. We will fully explore the patterns of missing data and identify where potential biases may arise. We will also consider a range of imputation techniques, from simple to advanced methods.

These methods predict what missing information might have been and incorporate it, along with uncertainty about the predicted information, into our models. But we’ll explore alternative approaches to this problem as well – for example building cancer risk models that inherently allow for patients to have missing or incomplete data.

As well as biases due to missing data, we’ll also address other potential biases in the data and models, with a with a view to narrow (rather than perpetuate or increase) inequity and unfairness across protected characteristics such as ethnicity.

By incorporating continuous updates of health information, these models provide a nuanced, evolving perspective on cancer risk.

Another key focus is on building dynamic prediction models for cancer risk.

These models can provide real-time updates on cancer risk based on a person’s health information unfolding over time. By incorporating continuous updates of health information, these models provide a nuanced, evolving perspective on cancer risk – crucial for the proactive management of patient health. To ensure long-term future implementation in clinical settings, it’s also important that our models can easily be adapted and updated with changes in lifestyle trends, new medical treatments or be transferred to different populations.

And, of course, we’ll evaluate the performance of our developed models across diverse datasets and across subgroups, including by ethnicity and geography, and identify any aspects that may impact their effectiveness.

Through these efforts we hope to create cancer risk prediction tools that are not only accurate and comprehensive but also equitable and adaptable to a broad range of real-world conditions.

The clinician

Dr Garth Funston

“While cancer prevention and early detection initiatives have often focussed on single cancer types, I’m interested in looking at multicancer prevention and detection approaches.”

Dr Garth Funston, GP and Clinical Senior Lecturer in Primary Care Cancer Research at the Wolfson Institute of Population Health.

As an academic GP I use the UK’s incredible healthcare data resources in my research on cancer prevention and detection, but challenges in accessing and linking large data sets have often been a barrier in my work.

Although we can learn a lot about cancer risk from individual data types, such as GP records, administrative records and cohort study data, combing information across these data sources provides a real opportunity to pick up new patterns and key combinations of factors associated with cancer risk. That’s why I’m particularly excited about CD3 – it aims to tackle these challenges by bringing together diverse healthcare and administrative datasets which contain a wide range of potential cancer risk factors.

This is a unique opportunity to improve understanding of cancer risk and develop clinically useful models to support doctors and patients in making informed, individualised decisions about cancer prevention and testing options.

While cancer prevention and early detection initiatives have often focussed on single cancer types, I’m interested in looking at multicancer prevention and detection approaches.

While cancer prevention and early detection initiatives have often focussed on single cancer types, I’m interested in looking at multicancer prevention and detection approaches as there is significant overlap between cancer risk factors. Indeed, some individuals are at higher risk of multiple cancer types due to a combination of factors.

Within CD3 I’ll work on building models which provide individualised multicancer risk and examine the potential impact of prevention and multicancer screening interventions. At the national level I believe that knowledge of multicancer risk could help target preventative interventions and tailor future multicancer detection screening programmes to maximise benefits and minimise harms from over-testing and false positive screening test results.

As a GP I also see real value in having accurate multicancer risk information within primary care. My patients are often concerned about their cancer risk and uncertain about the value of screening or cancer prevention activities, such as weight loss programmes or chemoprevention, will have for them. There are currently no tools which provide the combined multicancer risk in the primary care setting, so it is difficult to provide them with accurate information. Being able to share individualised information on their risk across a range of cancers, and the potential impact of prevention and screening interventions on that risk, would make a massive difference in helping me to support patients to make informed choices.

As a clinical academic I’m very much focussed on how any research can change my practice and have a tangible impact on patient care. So, for me, a key strength of CD3 is bringing together a diverse group of patient and public representatives, clinicians, policy makers and stakeholders from across the UK to collaborate on the project.

Given advances in data analytics, I believe there’s a real opportunity to use the UK’s unique data resources in new ways to drive improvements in cancer prevention and early detection.

The equity and diversity expert

Ameeta

“Everyone is different and depending on a person’s experience and background, they may have different views about how their data are used.”

Dr Ameeta Retzer, Research Fellow at the University of Birmingham’s Centre for Patient Reported Outcomes Research

CD3 represents a huge opportunity to understand who is most at risk of developing cancer and enhancing how we prevent, detect and diagnose cancer early. But there are some important considerations.

First, we need to do this work in partnership with the public, so we know how people feel about their data, how it is used, and their consent. Everyone is different and depending on a person’s experience and background, they may have different views about how their data are used. We plan to involve a diverse range of people, including those not often involved in research, to explore we can make sure that CD3 is acceptable to all.

Data is not perfect – there may be gaps for several reasons. Perhaps the data is not categorised in the same way, or maybe some people are not represented in the data.

Second, data is not perfect – there may be gaps for several reasons. Perhaps the data is not categorised in the same way, or maybe some people are not represented in the data (which can be due to inequalities in access or opportunity). If there are gaps in the data, the resulting output may work for some groups better than others. For the CD3 project, the aim is to assess the data to understand the gaps and then address them.

Third, we need to understand how the outputs of CD3 will be implemented in practice, and how it will interact with health inequalities. It’s known that some groups have better or worse cancer outcomes than others for reasons that include differences in access to, and quality of, healthcare. As CD3 is likely to impact upon how we prevent, detect and diagnose cancer early through screening, we need to understand how to best engage people to be sure that everyone can experience the benefits. Without this, CD3 could add to existing health inequalities or create new ones.

Research funders and institutions are increasingly aware of the barriers experienced by particular people when working towards a career or involvement in research. An investment such as CD3 presents the chance to make a step change in who can have a sustainable and fulfilling research career.

We will do this in four ways. Broker access to inclusive infrastructures (staff networks, support services, leadership programmes targeted at those with protected characteristics) in participating institutions for the CD3 research community. Advertise CD3 opportunities through diverse and non-traditional routes, offering opportunities on part-time and full-time basis with flexible arrangements in line with institutional policies. Take part in local and national research activities that are intended to make research more inclusive, so we can share best practice, learn from others and CD3 researchers can access a wider range of development opportunities. Finally, we’ll support involvement by members of the public who are diverse and representative in CD3 research through outreach initiatives and inclusive terms.

Take a look at CRUK’s Research Data Strategy to see how else we are unleashing the enormous power of big data

    Comments

  • Garth Murphy
    31 January 2025

    I’m a Research Champion at my local Trust, would be delighted to be involved in this study

  • amy
    27 January 2025

    this makes me want 2 learn more about cancer on cancer research UK :-)

  • John Reeve
    23 January 2025

    A great initiative of major importance. I’ve become involved in supporting the better use of cancer data as a patient representative via several projects over the last 2 years. I’d be very happy to support Data Driven Cancer Detection

  • Louise
    23 January 2025

    This is an exciting and much-needed initiative, and it’s inspiring to see the breadth of expertise brought together for the CD3 project. The potential for improving early cancer detection and addressing health inequalities is immense. However, I feel it’s important to highlight an underlying issue that the project/article doesn’t fully address: data access beyond CD3.

    The project has clearly overcome significant challenges in integrating and accessing diverse datasets for itself, but there’s little discussion about how this work will improve access to health data more broadly. Many researchers and clinicians outside of initiatives like CD3 still face systemic barriers to accessing and linking the data needed for impactful work. Without addressing these wider issues, there’s a risk that CD3 becomes an impressive but isolated example, rather than fostering the systemic change needed across the research and healthcare ecosystem.

    I’m also concerned about equity in sharing the outputs of CD3. While the focus on inclusivity and representation in data models is commendable, how will the findings, tools, and infrastructure be shared? Open access and transparency could be key to ensuring that the benefits of this work extend beyond the consortium and prevent further exclusivity in research.

    Finally, while the article touches on technical challenges like missing data and bias, it doesn’t acknowledge the institutional barriers to data sharing, such as ownership and regulation. These are crucial issues that must be tackled to ensure that initiatives like CD3 create a lasting impact across the research community.

    CD3 represents an incredible opportunity not only to advance cancer detection but to set a new standard for how health data is accessed, integrated, and shared. I hope the project will consider how to drive systemic improvements in data access, so that its benefits can ripple out to researchers, clinicians, and ultimately patients everywhere.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.

    Comments

  • Garth Murphy
    31 January 2025

    I’m a Research Champion at my local Trust, would be delighted to be involved in this study

  • amy
    27 January 2025

    this makes me want 2 learn more about cancer on cancer research UK :-)

  • John Reeve
    23 January 2025

    A great initiative of major importance. I’ve become involved in supporting the better use of cancer data as a patient representative via several projects over the last 2 years. I’d be very happy to support Data Driven Cancer Detection

  • Louise
    23 January 2025

    This is an exciting and much-needed initiative, and it’s inspiring to see the breadth of expertise brought together for the CD3 project. The potential for improving early cancer detection and addressing health inequalities is immense. However, I feel it’s important to highlight an underlying issue that the project/article doesn’t fully address: data access beyond CD3.

    The project has clearly overcome significant challenges in integrating and accessing diverse datasets for itself, but there’s little discussion about how this work will improve access to health data more broadly. Many researchers and clinicians outside of initiatives like CD3 still face systemic barriers to accessing and linking the data needed for impactful work. Without addressing these wider issues, there’s a risk that CD3 becomes an impressive but isolated example, rather than fostering the systemic change needed across the research and healthcare ecosystem.

    I’m also concerned about equity in sharing the outputs of CD3. While the focus on inclusivity and representation in data models is commendable, how will the findings, tools, and infrastructure be shared? Open access and transparency could be key to ensuring that the benefits of this work extend beyond the consortium and prevent further exclusivity in research.

    Finally, while the article touches on technical challenges like missing data and bias, it doesn’t acknowledge the institutional barriers to data sharing, such as ownership and regulation. These are crucial issues that must be tackled to ensure that initiatives like CD3 create a lasting impact across the research community.

    CD3 represents an incredible opportunity not only to advance cancer detection but to set a new standard for how health data is accessed, integrated, and shared. I hope the project will consider how to drive systemic improvements in data access, so that its benefits can ripple out to researchers, clinicians, and ultimately patients everywhere.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.