An image of DNA

illustration DNA wire frame Futuristic digital design,Abstract background for Business Science and technology

Linked genomic and patient cancer registry data can advance research and guide treatment decisions. We’re working with Public Health England (PHE) to help scientists, clinicians and patients unleash the potential of linked cancer and genomic data. The National Lung Matrix Trial results published today in Nature highlight the power of genomic profiling for precision medicine.

A treasure trove of data

Every year, the NHS cares for 500,000 people with cancer in England. The National Cancer Registration and Analysis Service (NCRAS), which is part of PHE’s National Disease Registration Service, collects data from all of these patients. In addition to demographic and clinical data, NCRAS holds somatic mutation data from more than 55,000 tumours. These data come from tumour testing done at molecular genetic, cytogenetic, molecular pathology and haematology labs across the country. NCRAS also holds germline, inherited mutation data collected from more than 90,000 individuals using a pseudonymisation approach. This wealth of genomic information is a powerful tool to study, predict, diagnose and guide the treatment of cancer.

Bench and bedside insights

The routine clinical care of patients with cancer has become increasingly dependent on genetic and genomic information. “We sequence blood or biopsy samples collected from patients, and we need to be able to tell whether any variants we detect are associated with the disease or indicate that a patient is likely or unlikely to respond to a particular drug,” says Rachel Butler MBE, Professor of Genomic Medicine and Operational Director of SW Genomic Laboratory Hub, North Bristol NHS Trust. The NCRAS database is an invaluable tool. “Without being able to release our data and link it with other types of data, we wouldn’t be able to make sense of them.”

Beyond immediate patient care, “linking patients’ clinical data with genomic data can help answer research questions relevant to patient health that would otherwise be difficult to tackle,” says Dr Maria Antonietta Cerone, Research Programme Manager in our Precision Medicine team. “We have set up a project with PHE that combines clinical data from NCRAS with genomic data from our Stratified Medicine Programme 2 (SMP2) – a molecular screening programme for patients with lung cancer. The goal of the project is to improve patients’ stratification, treatment and, therefore, outcome.”

Linking patients’ clinical data with genomic data can help answer research questions relevant to patient health that would otherwise be difficult to tackle.

When a patient is diagnosed with advanced non-small-cell lung cancer, they have a biopsy of their tumour. Consenting patients enrolled in SMP2 give permission for a surplus of their biopsy to be used to determine their genetic profile according to a 28-gene-panel test. On the basis of these results, patients are stratified to different biomarker and treatment cohorts within the National Lung Matrix Trial, an early Phase 2 clinical trial we fund. Peter Fletcher is the statistician on the trial, which opened for enrolment in March 2015 and has now recruited more than 300 patients. “It’s what we call an efficacy-signal-finding trial,” says Peter. “We’re looking to see whether a particular targeted treatment has a noticeable positive effect on the patient’s disease.”

The current results of the trial, which is still ongoing, were published today in Nature. They reveal that some of the drug-biomarker combinations that looked promising in pre-clinical studies and have been evaluated in the trial have indeed clinical benefits in patients. These findings highlight the power of genomic profiling for precision medicine.

On 2 October 2019 we hosted a workshop with PHE in Birmingham focused on the collection, access and usage of NCRAS clinical and genomic cancer data to advance research and patient care. The attendees included academic and clinical researchers, patients, staff from NHS genetics labs, and representatives from the pharmaceutical industry and charities such as use MY data and Blood Cancer UK.

Sir John Burn, Professor of Clinical Genetics at Newcastle University and Chairman of Newcastle Hospitals, chaired the workshop. With Jem Rashbass, Fiona McRonald and Steven Hardy from PHE, he is one of the driving forces behind the collection of germline data by NCRAS. Sir John has been studying the effects of aspirin in people with Lynch syndrome, a hereditary cancer, with our funding support. “The data from NCRAS have helped us establish the preventative effects of aspirin in people with Lynch syndrome in our 10-year follow-up study.”

The CRUK award gave us the manpower to develop methodology about how best to use these data to classify cancer susceptibility variants.

Clare Turnbull is Professor of Medical Genomics at the Institute of Cancer Research. Clare and her colleagues set up CanVIG-UK (Cancer Variant Interpretation Group UK) to try to harmonise different types of data, including case-control frequencies, and create a resource that helps genetic labs understand their data. In 2018, Clare received funding from us for the CanGene-CanVar project, which also involves NCRAS. In an article published this year in the Journal of Medical Genetics, CanVIG-UK summarise their activities and achievements, which include a data sharing platform. “The Cancer Research UK (CRUK) award gave us the manpower to develop methodology about how best to use these data to classify cancer susceptibility variants and understand the life-course pattern of the disease in people who have these variants.”

One of the first projects Clare and her colleagues have embarked on is a proof-of-principle project focused on two genes, BRCA1 and BRCA2. All 17 genetics laboratories in England and Wales that test these genes are engaged with the project and are submitting data to NCRAS. The service feeds the full national data set into the BRCA Exchange as part of the BRCA Challenge, an international project that aims to pool information on variants within BRCA1 and BRCA2.

“We at the Human Variome Project are a co-sponsor of the BRCA Challenge,” says Sir John. “When we started, there were several databases, which collectively had about 6,000 variants. We have now gathered some 25,000 variants.”

Challenges in data collection and usage

Despite recent advances, collecting and using genomic cancer data at scale remains challenging, for various reasons. First, genomic datasets are not standardised across labs and require clean-up and reformatting before they can be transferred to a common database. The demographic data associated with genomic data also have errors and inconsistencies. “A lot of time and effort goes into just tidying things up,” says Peter.

Second, data clean-up and deposition are not formal responsibilities of the genetic labs. “These jobs have to be done by some fairly stressed people, in an increasingly demanding world,” comments Sir John. “There’s no remuneration for the labs to submit their data,” says Clare. “This is done out of goodwill.”

Genomic data may feel very personal.

Third, there are ethical and governance issues to consider in any work that involves sharing of sensitive personal data. These data enable experts to interpret complex pieces of biological evidence, leading to scientific and medical advances that ultimately improve care for all patients. But these advantages must be balanced against the duty to protect the confidentiality of each individual patient. Section 251 of the NHS Act 2006 allows NCRAS to collect and hold data from patients with cancer in the interest of improving patient care or in the public interest. However, “patients won’t necessarily be comfortable with that,” says Pete Wheatstone, who is on the CRUK and British Heart Foundation Patient Data Reference Panel and who attended the Birmingham workshop. “Genomic data may feel very personal.”

Even if patients want their data to be used, they may be concerned about data security. “There are many restrictions and practical implications in keeping patients’ data secure while using the data,” Peter Fletcher comments. The collection of personally identifiable germline sequencing data is a particular area of concern. “These data don’t come under section 251 and are very sensitive,” explains Sir John. “That’s why we use pseudonymisation.” This process makes it possible to share important genetic information anonymously and securely, whilst still enabling linkage to diagnosis, treatment and outcome data held within NCRAS.

How to access NCRAS genomic data

“We are in the final stages of a robust quality assurance of the genomic data collected by NCRAS,” says Steven Hardy, Head of Molecular Diagnostics for the PHE service that includes NCRAS. “Once this has been completed, the data will be accessible through PHE’s Office for Data Release (ODR) for appropriate requests.” Interested researchers will be asked to submit a standard proposal detailing the data they want to use and the rationale for their request; all proposals received by the ODR are considered on an individual basis.

“In the meantime, we encourage researchers to contact the ODR ( to see whether we have the data that can answer their specific questions or enable them to undertake pilot, collaborative, proof-of-principle studies,” explains Steven.

The road ahead

“We want to collect more data on more genes and do more analysis on the data that have been collected,” says Clare. “For example, we can now link pathology records with the genetic change the patients carry.” Sir John is also planning to add more data to the repository. “The high-volume microsatellite instability (MSI) test we developed here at Newcastle University will potentially allow us to put MSI data into NCRAS at scale, which would be very useful for disease management.” By combining data on inherited DNA variants with molecular patterns seen in tumours, researchers can gain a better understanding of the typical features of tumours in people who are genetically predisposed to developing cancer.

“Linking molecular data with a patient’s medical history to build up a picture of long-term outcomes for patients is a great area of opportunity,” says Peter Fletcher. Sir John wants to push forward large-scale initiatives focused on high-impact, frequently mutated genes like BRCA1, BRCA2 and the mismatch repair genes. “If we have an international database of variants that are properly curated, we can then work to optimise surveillance, screening and intervention strategies, like we did with the aspirin intervention.” Sir John would also like to ask people with hereditary cancers for their consent to have their name associated with their data. “We can then use Hospital Episode Statistics data and study their life course more easily and explicitly. We’d like to try our consenting process, as a pilot, in the Great North Care Record and then see if we can scale up.”

If we have an international database of variants that are properly curated, we can then work to optimise surveillance, screening and intervention strategies.

“Patients are often very supportive of our projects, but they also bring up salient points that maybe we haven’t considered in our enthusiasm,” says Rachel. Pete Wheatstone highlights the role that institutions that host and promote data sharing should have in explaining the benefits of collecting and using genomic patient data. “What data are being collected? Why is this being done, and what are the benefits to patients? There is a need for public education on genetics, to take the frightening aspect away.”

NCRAS is also developing standard reports than can then be fed back to the contributing labs to help them audit their processes and monitor their performance. According to the participants of the genomic cancer data workshop, these reports should include lab-specific metrics, testing methods, turn-around times and genes tested. They should also include information about the type of tests requested by each NHS trust, to ensure equity of access to medical care; whether test results provided by the genetic labs correlate with data in the Systemic Anti-Cancer Therapy dataset within NCRAS, to check whether genomic information is being used to guide treatment; how patients’ treatment and outcome correlate with tumour biology as indicated by the genomic aberrations identified; and cancer family history.

All of the participants of the Birmingham workshop call for more resource and coordination between various parties. “This is about developing a 3-way system involving clinical care within the NHS, researchers and funders such as CRUK,” says Clare. “The system should incentivise, remunerate and reward people in the NHS who collect data that helps research.” Rachel agrees: “We need the resources to be able to share our data, which will help us do our job properly.”

The UK may be a particularly favourable environment to explore the power of genomic cancer data. “We have a unique position in Britain, with the love of the NHS and the willingness to allow the NHS to care for people,” says Sir John. “There’s an opportunity for us to be world leaders in this area, and it’s not very expensive.”


There are no tags