Skip to main content

Together we are beating cancer

Donate now
  • For Researchers

Harnessing the power of data for paediatric cancer research

The Cancer Research UK logo
by Cancer Research UK | In depth

29 May 2024

0 comments 0 comments


There are many challenges we must address to get better at meeting the needs of children and young people with cancer, and advances in how we utilise data could really accelerate this. Here, we dig into new Pilot awards using the power of data science to answer questions we’ve never been able to ask before…

Dr Emma Woodward, a clinical geneticist, works with patients who have hereditary conditions that make them more likely to develop cancer. “Just 30 years ago, a lot of children with these conditions wouldn’t survive their first primary tumour. Modern medicine is fantastic,” she says. “It’s also unleashed a lot of challenging clinical questions that we need to get better at answering.”

The seed of a new research project idea was planted during an appointment with a young adult who, after surviving a primary tumour as a child, had been diagnosed with a second primary tumour. “They had questions I couldn’t answer about their risk and survival. I was sure the answers would be in a database somewhere – but I had no way to access that information.”

The research community is at a tipping point in overcoming some of the challenges that continue to frustrate progress for children affected by cancer and their families.

Emma’s team recently received a Data for Children’s and Young People’s Cancer (Data4CYP) Pilot Award: £250k over two years to develop new, data-driven solutions to common challenges in children’s and young people’s cancers.

Together with the research community, CRUK has developed a programme that channels the ambitions of our Research Data Strategy to support data-intensive CYP research. This programme secured a £5 million investment to be spent over the next five years. Eight new projects have been funded by CRUK, with four of those jointly funded through partnerships – two with Great Ormond Street Hospital Children’s Charity and two with Children with Cancer UK.

“At CRUK, we have a dedicated research strategy to drive increased survival and long-term quality of life for children and young people with cancer. And it’s clear that an enabler of this research is data,” explains Dr Laura Danielson, CRUK’s research lead for children’s and young people’s cancer. “Advances in data sciences mean we’re now able to answer questions, like Emma’s, that we’ve never been able to even ask before.”

Speaking with Emma, Laura and other recipients of the Data4CYP award, there is a sense of momentum and hope: as if the research community is at a tipping point in overcoming some of the challenges that continue to frustrate progress for children affected by cancer and their families. There’s a wealth of data at our fingertips – and we’re on the cusp of unlocking its potential.

The Data4CYP awardees come from a breadth of computational, clinical and experimental backgrounds. We caught up with four of them to explore what they hope to achieve and where their pilot awards could take us.

Emma Woodward
Lead Principle Researcher for the project, Emma Woodward of the University of Manchester and Saint Mary’s Hospital, part of Manchester University NHS Foundation Trust.

Linking siloed databases to better understand life-long risk and survivorship

Faced with a patient whose questions that couldn’t be answered, “I realised we’ve got to do better than anecdotal information,” Emma says. “The NHS is awash with rich but siloed data – surely we can use it in a more meaningful way.”

Working with patient representatives and experts in genomics and software engineering, Emma’s seed of an idea grew into CanCYP: an online calculator that clinicians could use to predict primary tumour risk in children and young people with a predisposing gene alteration.

The team will explore whether it’s possible to link  individuals’ pseudonymised genomic data – captured through diagnostic and predictive tests and held by the NHS – with clinical records held by the National Cancer Registry and Analysis Service, including layers of family history, personal history and other risk factors. Initially, they’ll focus on TP53 and RB1 – two genes they’re confident could prove the principle of their idea.

Thanks to advances in genomic technology, and increasing survival rates for children and young people with cancer, data often now exists that spans from before a child is diagnosed with cancer, through their treatment journey and on to long-term survival, relapse or new primary tumour. Emma is hopeful that linking these data is the key to supporting clinicians to calculate personalised risk for their patients, so they can put in appropriate steps to support prevention or early cancer detection.

“I truly believe risk prediction and early detection are the most effective ways we can improve cancer outcomes – but only if they come hand in hand,” she says. Understanding risk in children and young people offers unique challenges, but also opportunities when it comes to keeping them healthy for the long-term. “We’re looking at life-long risk prediction and clinical decision making to support the kids in our clinics for the rest of their lives.”

Professor Anindita Roy
Joint lead Principle Investigator of the project, Professor Anindita Roy of The University of Oxford. Her co- PI is Dr Jack Bartram, Great Ormond Street Hospital for Children.

Connecting disjointed data to advance our understanding of rare diseases

Jointly funded by Great Ormond Street Children’s Charity

Infant ALL (iALL) is a very rare disease that affects babies under a year old. “Sadly, only around half of babies with iALL are cured,” says paediatric haematologist Professor Anindita (Andi) Roy. “Patients often respond differently to treatment, and we don’t know why.”

Like Emma, Andi and her Data4CYP team hope to tackle the challenge of multifaceted healthcare data.

Most children with cancer in the UK receive hub-and-spoke model care, with treatment coordinated by an oncology centre and delivered by district and community teams. “This means diagnostic tests, blood work, toxicity data… it’s all sitting in different places,” describes Andi. “The only way to know if we’re treating our patients in the best way is to pull all this data together into one place.”

The team hopes to use their pilot funding to create a national database that can longitudinally collect and store this data. In parallel, they’re performing tests on patient samples to enrich the dataset – such as immunophenotyping, single-cell RNA sequencing and multi-omics assays.

Beyond the clinic and the lab, Andi and the team want to understand the impact of treatment from a family’s perspective – qualitative data that isn’t usually captured. “We want to know about the days the children are too sick for school or nursery, or their sleep is affected, or their parents couldn’t go to work,” she says. Working with parent advocates and data strategists, they plan to develop a portal for parents to submit this information in a way that maximises interactive usage while preserving patient privacy.

Eventually, machine learning will be used to analyse this database of clinical, molecular and real-world data, looking for clues that could help build a holistic understanding of iALL – ultimately hoping to improve patient stratification and treatment to improve disease outcomes.

It might sound ambitious for a pilot project – but Andi’s eyes are firmly on the bigger picture. And with other childhood cancers and rare diseases presenting similar challenges of disjointed data, the team hopes their efforts could be easily scaled to other conditions. “We’ll spend these first two years building the pipelines, but if we’re going to improve outcomes for our patients, we’ve got to think long-term,” she says. “These are rare diseases, and too often they’re neglected. But for the families affected, they’re everything.”

Dr Ben Hall
Joint lead on the project, Dr Ben Hall of University College London. His co-lead is and Professor Tariq Enver, also at UCL.

Combining FAIR data principles with multidisciplinary collaboration

“Cancer research really is such a data-driven field,” says Dr Ben Hall, senior research scientist in UCL’s medical physics and biomedical engineering department. “Everybody is producing data – which creates a great environment for theoreticians like me.”

Ben approaches his Data4CYP award from a “twisty” career that has touched on molecular biophysics, systems biology and computational biology

– fuelled by an interest in using computational tools to understand how mutations cause cancer through physical and behavioural changes to a cell. The award represents Ben’s first foray into the field of children’s and young people’s cancer, sparked by a new collaboration with joint lead applicant Professor Tariq Enver.

Together, the team is using computational modelling to understand how mutational and epigenetic changes influence cell behaviour in childhood B cell precursor acute lymphoblastic leukaemia (BCP-ALL).

“BCP-ALL is a genetically well-defined disease – but we still don’t know why some children relapse or respond differently to treatment,” Ben explains. “By understanding the interaction networks of genes and proteins, and layering this with mutational data, we might work out how to stratify patients to make treatment more effective or identify new therapeutics for children who relapse.”

These integrated layers of data will be presented as an interactive web dashboard, so that anyone can use the team’s code to interrogate the data, such as looking at their own genes of interest or hoping to unlock new insights for other genetically well-defined conditions. This dashboard approach was inspired by a similar output on a recent metabolism study – which, “full disclosure, was the suggestion of the reviewer, but made so much sense when we saw it in action,” he says.

FAIR (findable, accessible, interoperable, reusable) data principles are an important part of all Data4CYP data sharing plans. But for Ben, the most exciting thing is when open access combines with meaningful collaboration to spark new avenues of exploration. “Data doesn’t exist as an island. It should be interrogated and challenged from multiple angles to really make the most of it,” he describes. “That’s what really leads to transformative research.”

Dr Simon Bomken
Project lead, Dr Simon Bomken of Newcastle University.

Ensuring early clinical trials are built on the best possible evidence

Proof of the many hats our clinician scientists wear, Dr Simon Bomken joined our chat fresh from an MDT meeting about one of his patients with Burkitt lymphoma – rare, fast-growing, and the most common type of non-Hodgkin lymphoma in children and young people.

While many children with Burkitt lymphoma survive their cancer, the treatment is intense, and a small percentage of patients experience relapse.

“When Burkitt lymphoma comes back, there’s a complete change in its behaviour,” Simon describes. “It switches from a disease we’re very good at treating, to one we’re extremely poor at treating.” With largely ineffective second-line therapies, the community is working hard to provide early phase clinical trial opportunities for those children. The recently opened Glo-BNHL trial is a great example.

Simon’s research is focused on understanding the fundamental cause of that switch, which could identify new therapeutics for those whose cancer has returned. It’s only recently that advances in technology have caught up with our ambitions to make progress against Burkitt lymphoma. “Just three to four years ago, we’d have said it wasn’t possible to stratify patients into molecular subgroups – that it was a pretty homogenous disease,” Simon says.

In 2022, with access to advanced sequencing technologies, the team identified TP53 status as the disease’s first risk stratifier. Now, with all paediatric malignancies eligible for whole genome sequencing, “there’s a lot of rich data available to us”.

The team’s Data4CYP award makes use of high quality, single-cell molecular, functional and transcriptomic data, collected from patient biopsies, the VIVO Biobank and healthy patients. By linking these datasets – and considering how further analyses, such as epigenetics, could be layered in the future – they hope to identify a core set of Burkitt lymphoma-specific pathways to explore further.

“There’s an inordinate amount of information stored in that data. The challenge is knowing how to interrogate it,” he says. “It’s thanks to our bioinformatics colleagues that we’ve identified a way to make sense of that data. There’s real value in stepping out of your comfort zone and looking beyond your field to find the right step forward.”

The team hope this could provide the starting point to bring our understanding of Burkitt lymphoma closer to that of more common childhood cancers, such leukaemia. This could also provide proof-of-principle for what can be achieved through multidisciplinary collaboration and a small number of high-quality samples, to help advance progress in other poorly understood conditions.

“For children who have relapsed, an early phase clinical trial may be the last glimmer of optimism in overcoming their illness,” Simon describes. “It’s paramount we’re deepening our understanding of Burkitt lymphoma, so we’re building those clinical trials on the highest quality preclinical evidence we can generate.”

There’s an inordinate amount of information stored in that data. The challenge is knowing how to interrogate it.

What’s next?

The eight Data4CYP recipients start their pilot projects this summer, running for two years. As well as funding, they’ll receive a range of wraparound support and form a network that shares their learnings and resources with each other and the wider research community.

“We received fantastic applications for this round of pilot funding. The ideas that came through really demonstrate the impact that could be made for children and young people, if we can harness the power of data,” says Dr Melissa Lewis-Brown, Head of Research Data Strategy at CRUK. “We’re excited about where these awards could take the field – not just the direct outputs of their pilot funding, but their plans for a wider legacy.”

Larger-scale awards are expected to be announced in 18-24 months’ time, though details of what the scope of this call will look like is still under development.


Data4CYP awards

CRUK has made a £5 million investment in a new research programme to support data-driven research questions to develop new, scalable and generalisable solutions to common challenges in children’s and young people’s cancers.

The first phase of this will be eight Pilot awards, which will run up to two years. Four of the awards have been jointly funded through partnerships – two with Great Ormond Street Hospital Children’s Charity and two with Children with Cancer UK.

Here are the awardees and what they will be working on…

Anthony Moorman, Newcastle University
ALLTogether for Research (A2G4R) Knowledge Hub 

There are at least 25 different genetic subtypes of acute lymphoblastic leukaemia (ALL). A current clinical trial for CYP-ALL (ALLTogether-01) uses 10 leukaemia genetic subtypes to guide treatment, however for the remaining subtypes, there is a lack of robust data regarding the relationship with risk of relapse.

This project will investigate the clinical relevance of the remaining 15 subtypes and establish whether they can be used prospectively to guide treatment.

Children with Cancer UK

Anindita Roy, University of Oxford
Scientific Advances in Infant ALL (SAIL) 

The project will bring together the clinical and scientific data available for all infants with ALL in the UK to learn about the efficacy of these newer therapies compared to historic treatments. A consortium of expert clinicians, scientists, data strategists and parent advocates will work on SAIL. The aim is to use the integrated datasets to understand why some infants with ALL do better than others, hopefully leading to a personalised medicine approach in the future.

Great Ormond Street Hospital Children’s Charity

Martin McCabe, University of Manchester
Routinely collected treatment data to evaluate the uptake and utility of UK paediatric early phase trial infrastructure 

The evidence base for treatment of recurrent childhood cancer is poor. And, although 500-600 UK children and adolescents/young adults (AYA) die of cancer annually, there has been no systematic analysis of treatments or treatment efficacy in this setting. This project will use data collected by NHS England on cancer treatments, genetic data from the SMPaeds study and whole genome sequencing with individual patients’ records to describe treatment patterns and treatment efficacy across the childhood/AYA cancer spectrum.

Great Ormond Street Hospital Children’s Charity

Katie Harron, Institute of Child Health and University College London
Education and longer-term health outcomes for childhood cancer survivors: linkage of cancer registration data to the ECHILD database for use by UK researchers 

The project will create a longitudinal, population-level database of hospital and education records linked with childhood cancer registrations data. It will link national cancer registration data with existing linked health and education data captured for 20 million children in ECHILD – a dataset capturing information from NHS hospitals and state-schools in England for children born since 1984. The aim is to quantify the difference in academic attainment trajectories and school support up to age 16, and hospitalisations and mortality into adulthood, between childhood cancer survivors and healthy peers.

Children with Cancer UK

Emma Woodward, University of Manchester
CanCYP: a tool to enable cancer risk prediction in children and young people with a cancer predisposing gene alteration 

This study will use hereditary genetic tests carried out in the NHS, along with NHS records about cancers that children and young people have already developed and the treatment they received. The aim is to use the information to calculate for a person with a gene alteration that can increase cancer risk, the actual chance of developing a cancer.

Simon Bomken, Newcastle University
Identification of prognostic biomarkers for childhood Burkitt lymphoma through multiomic data integration 

This project will look at how the core Burkitt lymphoma (the most common childhood non-Hodgkin lymphoma) transcriptome differs from normal germinal states. Using highly curated clinical cohort of childhood sBL cases to identify relapse associated transcriptional features and investigate their potential as risk-stratification biomarkers. Integration of multiomic data from rare sBL patient samples will provide an understanding of the transcriptome of this disease and a novel dataset for future additional integration.

Maria Hawkins, University College London
PROVIDENTIA 1000 Proton and radiation data combined with biology, imaging and long term outcomes to advance radiation combined modality treatments in CYP 

The project will assemble a comprehensive database of 1000 children and young people treated with radiation (proton and photon) focussing on brain malignancies to answer three questions: Can we refine radiation indications to improve outcomes using unbiased molecular and imaging data? Can we propose new solutions to mitigate toxicity development? Can we use AI to generate high-fidelity virtual synthetic data on a relatively limited real-world dataset?

Benjamin Hall, University College London
Elucidating the molecular pathways to relapse in childhood leukaemia through computational modelling

The project will use computational modelling, combining the gene levels with information about how genes interact, to build simulations that can predict the behaviour of cancer cells and exploit these models to find differences associated with poor patient outcomes and in turn to predict vulnerabilities that we might target to improve treatments.



Emily Farthing

Emily is a freelance science writer and communicator. 

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.