Skip to main content

Together we are beating cancer

Donate now
  • For Researchers

Joining the dots: How our new research data strategy will unlock the power of big data

The Cancer Research UK logo
by Cancer Research UK | In depth

7 July 2022

0 comments 0 comments

The reams of data produced by modern research holds incredible potential to advance our understanding of cancer and how we detect and treat the disease. But to capitalise on this needs a joined-up approach – here, Dr Melissa Lewis-Brown tells us why Cancer Research UK’s new research data strategy has an important role to play...

Big data is becoming a vital tool to beat cancer. However, our ability to generate cancer-relevant data of various types has outpaced our capability to use it to maximum effect. To capitalise on the full value of data will require its integration and analysis – thankfully, innovative data science now offers this opportunity.

Alongside data generated by research, such as omic data and clinical data, cancer researchers are increasingly using a wide variety of other data types. Health data held in electronic health records, the explosion of data acquired through wearable sensors or apps, as well as other modes of data, like environmental, can all now play an important role in cancer research. And that is to mention just a few. Overcoming the numerous challenges of using these data is the only way to unleash it’s potential to unlock new discoveries about cancer toward better prevention, earlier detection and more effective treatments.

Cancer Research UK has traditionally supported bench and clinical scientists. But data scientists and other data experts are increasingly working with other researchers to take a data-driven approach to the, now incredibly data-rich, field of discovery and translational cancer science. The approach to cancer research has grown to be multi-disciplinary over the years and it is now embracing data science, and other data experts, that have the skills to apply these approaches to make discoveries in cancer science.

Imaging, patient stratification and discovery science… data science is everywhere

So far, some of the best examples of effective data science have been using artificial intelligence to glean more information from images than the human eye can manage. Traditionally, for example, radiologists eyeball images looking for signs of cancer. Increasingly machine learning and computer vision are used to not only automate this process, but to detect cancer earlier. These techniques can ‘see’ things the human eye cannot – what’s more, they can often do it a lot quicker.

And we have seen a data-centric approach used for treatments as well as detection. When it comes to radiotherapy, for example, clinicians must first pinpoint where to apply radiotherapy – a process which can be time-consuming when done manually. Again, machine learning and computer vision can take data from thousands or millions of previous patient pathways to predict which segmentation and treatment are most likely to lead to the best health outcomes.

“More and more researchers are incorporating data science to address specific challenges in their own field”

An example of this is Professor Raj Jena’s work. Raj applies advanced imaging techniques and radiotherapy treatment modalities to improve outcomes for patients with central nervous system solid tumours. He has a specific interest in magnetic resonance imaging of these tumours and intensity-modulated radiotherapy.

Another example can be found in the field of breast cancer. CRUK funded the generation of the OPTIMAM Mammography Image Database – a sharable resource with processed and unprocessed mammography images from UK breast screening centres, with annotated cancers and clinical details. The database includes serial screening mammograms that were collected over a 10-year period with data from nearly 173,000 women. The database is made up of data on all breast cancers in a screened population including interval cancers. This resource has been widely reused to develop and evaluate artificial intelligence algorithms for breast cancer detection.

Data science is not limited to pure data science teams, however. More and more researchers are incorporating data science to address specific challenges in their own field. Many research group leads, who may not identify as data scientists themselves, are increasingly bringing expertise such as machine learning into their teams where they add the most value to their work.

Take Professor Rebecca Fitzgerald’s work in developing the Cytosponge, for example. The device – a small capsule attached to a fine string which allows cell collection from the oesophagus – aids the early detection of Barrett’s oesophagus and oesophageal cancer. Rebecca and her team took this work on the early detection strategy of oesophageal adenocarcinoma cancer – widely recognised to be a cancer of unmet need – from discovery science all the way through to implementation in clinic.

The development of the Cytosponge has become a celebrated story – with Rebecca presenting the work as a plenary lecture at the recent American Association for Cancer Research conference.  However, less well known is the role that data science and multimodal big data played in the development of the device. One of the pivotal steps in Rebecca’s work was to identify whether Barrett’s is a necessary step in the development of this cancer type. After single cell RNA sequencing uncovered a number of candidates, a computational modelling approach was used to identify the obligate cell precursor that was always associated with Barrett’s oesophagus. What’s more, this result was then validated using a multiscale computational model using population-level health.

However, this cancer is not sufficiently common to justify screening whole populations of people, the challenges was to identify who is at highest risk from this cancer to be screened using the Cytosponge. The team knew that this cancer is more prevalent in men and certain parts of the world, and associated with symptoms such as heartburn, as well as the more common risk factors for cancer – age, BMI and smoking status. This, then, was a problem perfectly suited to a solution involving the integration of different modes of data. An algorithm was developed and used to do just that enabling the team to identify people at highest risk so they could target screening efforts.

Together, wet lab work to find a biomarker and data science involving multimodal data led to the Cytosponge being used to identify ten times more Barrett’s oesophagus cases then the gold standard approach. It has been a tremendous success story, and much of the data used, at least in the pre-clinical work, was sourced from UK samples. That is testament to what can be achieved with collaboration around an infrastructure which is fit for purpose.

The Cytosponge work is not the only example of how data science can stratify patients for improved cancer care. Computational techniques are also helping to stratify patients undergoing immunotherapy and radiotherapy. Around 60% of all people diagnosed with cancer undergo radiation therapy at some point, so being able to identify those most likely to suffer severe side effects and using AI to improve the application of radiotherapy is vital. Computational biologist, Professor Bissan Al-Lazikani, used data on outcomes, clinical metrics and genetic profiles from nearly 1,000 prostate cancer patients to generate a machine learning model. She found many parameters that seemed to affect an individual’s likelihood of severe toxicity and managed to identify those that mattered most.

And, of course, data science can also be hugely effective in discovery science as well as clinical work. Professor Serena Nik-Zainal and her team use computational methods to describe the scars of historic DNA damage and repair processes, that have occurred throughout the development of a tumour. These passenger mutations – although not cancer-driving – can help us to understand the aetiologies that underpin cancer. The insights that Serena and her team have gained through the combination of computational analysis and experiments in cell-based systems has led to the development of clinical algorithmic tools – tools that they intend to translate into clinical utility soon.

Our Research Data Strategy

These are all examples that have overcome the many obstacles facing data science. But others remain, and they prevent us from fully utilising the power of this approach for cancer research. It is these challenges that our recently launched Research Data Strategy aims to solve, and it is that power that we want to realise, for the benefit of cancer patients.

In the cancer community we are in the service of those affected by cancer. Patient trust is integral to cancer research, particularly in the way that we use patient data for research. Patient privacy, understanding, trust and support are all essential ingredients of data science in cancer research. So, the patient voice will be at the heart of this strategy. We commit to understanding what people’s concerns are and what we can do to co-develop activities to mitigate those concerns. And we’ll help the public to see how patient data benefits people affected by cancer through our engagement and involvement work.

For all the promise of machine learning, computer vision, advanced statistics etc, each relies on getting timely access to high quality data. But understanding how to effectively use that data, and fully realising it’s potential, requires the linking together of various data types – and that can be a real challenge.

Our new Research Data Strategy aims to address many of these challenges and build the foundations of a thriving data science community with the support, trust and involvement of patients and the public, and unimpeded by issues of data access, quality, linkage etc. This is a long-term ambition, but we are putting the full weight of CRUK behind making this vision a reality for the benefit of people affected by cancer.

CRUK funded research generates phenomenal amounts of research data. We’ll be encouraging and facilitating the sharing of high-quality research data for further cancer research, because research data is intellectually valuable for the entire cancer research community. We intend to work in partnerships with others to solve the challenges and realise the patient benefits of secure data-sharing to facilitate innovative data science and unlock insights from big data.

The foundations of our Research Data Strategy are to:

  1. Earn and maintain public and patient trust, confidence and support, as well as amplifying their voice in discussions around patient data.
  2. Make research data more findable, accessible, interoperable and reusable, whilst preserving security and privacy of patient data.
  3. Strengthen the national cancer data science environment.
  4. Build a collaborative and supportive cancer data science community.
  5. Translate discoveries into patient benefits through transparent and fair partnership with public, private and charity sectors, including universities, the NHS and commercial entities.
  6. Attract and retain data talent.
  7. Ensure equality, diversity and inclusion in those who do data science and patients who benefit from it.
  8. Improve environmental sustainability and efficiency of data science.


The whole driving force of this strategy is to improve outcomes for cancer patients. Maximising the amount of knowledge the cancer research community can glean from all available data is the very thing that will allow us to learn from the patients that have suffered, and been lost to, cancer. The sheer power of the footprint they have left in the form of research and health data holds incredible potential. And it’s a potential we will work relentlessly to achieve because it will not only help improve outcomes for patients today, but also help bring forward the day when all cancers are cured.

We need you…

We want your views as a researcher on this. If you would like to share your thoughts on what we are doing in this space, please contact me and let me know at [email protected]

Dr Melissa Lewis-Brown is Head of Research Data Strategy at Cancer Research UK