Skip to main content

Together we are beating cancer

Donate now
  • For Researchers

Big data and cancer – our greatest obstacles present a real opportunity

The Cancer Research UK logo
by Cancer Research UK | Analysis

23 April 2025

0 comments 0 comments

Research data

The incredible potential of big data must be realised in the right way. To do this, says Gemma Codner, we must overcome many challenges but with the right leadership, it is possible…

Laboratory techniques, imaging modalities and biotechnologies have advanced rapidly in recent years, leading to a vast increase in the volume of data generated in cancer research.

Prime examples include Next Generation sequencing (NGS) to profile mutations and guide targeted therapies, the introduction of higher resolution magnetic resonance imaging, and wearable devices tracking vital signs and health parameters for treatment monitoring.

Together with an increased breadth and depth of data collected, there has also been a rapid evolution in data science and the tools available to mine data. For example, large-language models (LLMs), artificial intelligence (AI) for workflow optimisation and digital twins are now widely applied in cancer research and precision medicine. These developments offer a huge opportunity to advance knowledge, embed prevention, and improve treatments for better patient outcomes.

Real-world data – routinely collected during healthcare interactions – and associated biological samples hold immense potential, particularly in advancing cancer studies.

Whilst clinical trials remain the gold standard of research, real-world data – routinely collected during healthcare interactions – and associated biological samples hold immense potential, particularly in advancing cancer studies. However, the unstructured nature of the data captured – doctor’s notes captured in free text is a prime example – a lack of data standardisation and a lack of join-up between primary and secondary care settings, all make the use of this data challenging.

A community takes action

To adapt to this rapidly evolving data landscape, Cancer Research UK introduced its data strategy in 2022, followed by the creation of its research data community.

Six Special Interest Groups (SIGs) lead this community, focusing on critical challenges in cancer data research. Recently, the Health Systems Data group and the Data and Samples Reuse group have organised workshops, webinars, and surveys to engage the wider research community. These initiatives aim to identify the barriers researchers face in accessing data and reusing samples while exploring potential solutions to advance cancer research.

Whilst big data holds wide-reaching potential for cancer research, the challenges associated with data sharing and knowledge exchange mean that this opportunity remains untapped.

These activities brought together a diverse range of self-selected stakeholders, including healthcare professionals, clinical research scientists, patients, and epidemiologists. The findings highlighted recurring challenges in cancer research, such as governance issues, and legal and ethical barriers, each of which hamper the timeliness and availability of valuable datasets to researchers. Furthermore, the community voiced concerns over data quality, including poor curation, a lack of standardisation across systems and inadequate metadata.

Whilst big data holds wide-reaching potential for cancer research, the challenges associated with data sharing and knowledge exchange mean that this opportunity remains untapped. The real-world data survey found that only 21% of respondents considered the data (easily) accessible – a finding that was mirrored in the subsequent sample and data reuse survey. Additionally, only around one-third of respondents were based at centres with established processes for access to real-world data for research.

Research data

The value of real-world data was heavily supported, with over 90% of survey respondents supporting its ability to inform clinical decision-making, but also the potential to provide evidence that cannot be provided by clinical trials. Stakeholders demonstrated an interest in using a range of data types, including patient records, biological samples, genomic data, and pathological data for their research.

However, a quarter of survey respondents deemed the quality of these datasets is often too poor to be useful. This concern was further supported in discussions at the workshop, coupled with an uncertainty as to whether quality control measures were in place at participant’s institutes.

A clear demonstration…

Another strong theme to emerge from these activities is the need for funding of infrastructure and training. Aside from disseminating knowledge around access to existing resources, ensuring that datasets are better curated with wider training about data standards and/or funding for retrospective data harmonisation, would drastically enhance the portability and interoperability of the data.

The Demonstration Project Award call, launched by CRUK in autumn 2024, was open to the special interest group leads to encourage cross-interest group collaboration. Leads had the opportunity to apply for collaborative awards of up to £250,000 for a duration of 18 months. Following peer review, four projects were awarded funding, with CRUK providing nearly £590,000 in support. Kicking off in April 2025, the funded demonstration projects focus on key areas such as data standardisation, data diversity, patient engagement in children and young people’s (CYP) cancer research and improving findability and accessibility of CRUK-funded datasets.

Awardees will be invited to provide an update at the 2026 CRUK data-driven cancer research conference. These Demonstration award-funded projects aim to begin addressing challenges affecting the entire cancer research community and prove, with appropriate investment and infrastructure, that these obstacles can be overcome.

Gemma Codner

Author

Gemma Codner

Gemma is Cancer Research UK Data Community Coordinator

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.