Researchers have found fingerprints of the processes that cause cancer

We know a great deal about the environmental and lifestyle factors that cause cancer on a population level. But being able to say for certain what ‘caused’ a particular patient’s cancer is almost always impossible.

This is because, in nearly every case, a cancer’s underlying genetic damage has many sources – chiefly ‘natural’ damage that occurs as we age, but also in the form of exposure to carcinogens like tobacco smoke and ultraviolet (UV) radiation, or from faulty maintenance processes in our cells.

For some of us, our bodies repair this damage. Others aren’t so fortunate.

And as much as ageing is inevitable, it’s also impossible to live life without coming across something carcinogenic at some point (although perhaps not as often as some would have you believe).

So mapping ‘cause’ to ‘effect’ is beyond the current abilities of medical knowledge. Indeed, we may never be able to say 100 per cent for sure what molecular event (or more properly, events) in a patient’s life triggered their cancer.

But the current explosion in both gene sequencing technology and the ability to mine the vast quantities of data it generates, is leading to some incredible insights into the different forces at work inside our cells, and how these are linked to the disease.

Just last week, researchers in Singapore discovered the ‘fingerprints’ of a carcinogen found in some medicinal Chinese herbs all over the DNA from people with a certain type of kidney cancer.

And today, an international team of researchers, led by the Wellcome Trust’s Sanger Institute in Cambridge, have revealed the signatures of at least twenty different processes at work on the DNA of different types of cancer.

So what are these ‘processes’, and how will understanding them help researchers to help doctors to help patients?

Gene sequencing and ‘big data’

The ability to analyse rapidly the entire DNA contents of a patient’s tumour – so-called ‘whole-genome sequencing’ – has revolutionised cancer research, and allowed researchers to do things that were unthinkable just a decade ago.

We can now track how tumours evolve as they spread, spot different subtypes of cancer, and track how the disease responds to treatment. We’re learning more about the faulty genes that cause cancer and mediate drug resistance. And we’re learning about the profound chaos that develops in cancer’s DNA.

But we know surprisingly little about the molecular events that cause this chaos – and what we do know has generally come from laboratory studies. We know, for example, that the chemical carcinogens in tobacco smoke physically stick to the DNA molecules in our cells. We can watch these chemical additions – in the Petri dish – as they interfere with the machinery that copies and repairs our DNA. And we can spot similar-looking damage in cancer genes in smokers.

We can also study other types of damage – for example when particular repair processes go wrong inside cells, which seem particularly important as cancers develop over time.

Yet until the advent of large-scale gene sequencing studies, researchers had never been able to look at the resulting patterns of damage across different types of cancer, and work out how important each one is in the diseases’ development.

But now researchers can play ‘CSI’ on a tumour’s DNA, and look for the fingerprints of the culprits causing cancer’s genetic chaos.

It’s all about context

DNA structure

DNA is made of four different chemicals. Click for a larger version.

Our DNA is composed of long strings of four chemicals – adenine, guanine, cytosine and thymine – usually abbreviated to A, G, C and T.

It’s the precise order – or sequence – of thousands of these letters that makes up a gene, and it only takes one ‘letter’ to be changed to scramble the gene’s ‘meaning’ and – potentially – cause cancer.

Researchers are fast realising that different processes that change and damage DNA tend to leave behind specific fingerprints..

For example, the bulky chemical attachments resulting from tobacco damage usually result in the replacement of cytosine with an adenine (C→A).

Similarly, the action of UV light can cause adjacent pairs of cytosine to stick to each other. When the cell’s repair machinery detects this, it converts the cytosine to a thymine (C→T) to repair the damage.

If either of these happens in the middle of a gene, the result can cause chaos.

But it’s not just the single ‘letter’ swap that’s revealing. It’s the context. Different processes tend to result in changes in DNA letters in different ‘contexts’. For example, the changes caused by UV light can be spotted because they occur at a cytosine that’s preceded by another cytosine in the DNA sequence (In other words, CC becomes CT).

These studies led to a seminal finding in 2009. Researchers at the Sanger Institute looked at the entire DNA sequence in tumour samples from two patients – one with melanoma, one with lung cancer. As predicted, they found the hallmarks of tobacco damage all over the DNA from the lung tumour, and widespread UV damage in the melanoma DNA.

It was the first time the size of these carcinogens’ fingerprint of had been measured across the whole human genome.

Since then researchers have gone hunting for other such fingerprints, and have spotted several more. Many cancers are riddled with C→T changes that happen in a CG sequence. Last May, certain breast cancers were found to contain ‘signature’ changes of certain contexts of C→T and C→G, which are suspected to be caused by enzymes called APOBECs, which are thought to form part of our natural defence against viruses.

And last November, US researchers spotted a T→G change in a type of oesophageal cancer called adenocarcinoma, which itself is linked to chronic inflammation and acid reflux.

Looking across different types of cancer

The finding published today in Nature takes this work a step further.

The Sanger team, led by Ludmil Alexandrov, have drawn together data from more than 7,000 tumour samples, representing 30 common types of cancer. The data came from a whole slew of previous work by the cancer research community – some publicly available on the web, some drawn from various international collaborations like the International Cancer Genome Consortium and The Cancer Genome Atlas, and some ‘donated’ by labs around the world.

Using analytical software, they mined this vast trove of data for patterns – not just the signatures previously discovered, but any other patterns in the data.

They discovered more than 20 distinct signatures in the tumours’ DNA – some common to all types of cancer, others specific to just a handful of types. All of the cancers had at least two signatures. Some, like liver cancer, had as many as six.

But they were only able to deduce a ‘known’ cause for eleven of their signatures. We’ll look at what these are below, but what this means is profound – there are at least ten unknown processes causing gene damage in cancer. We urgently need to find out what these are, and whether halting or targeting them could help prevent or treat the disease.

Let’s look at the ‘known knowns’ – the signatures that were linked to a known cause.

Hallmarks of cancer


Processes that damage DNA’s structure can lead to cancer

The first signature was clearly linked to ageing – it became more common the older patient were, was found in all types of cancer, and in 60 per cent of the samples overall.

This is entirely to be expected: age is the biggest risk factor for cancer.

For two other signatures, the prime suspect is the APOBEC enzymes mentioned above in the context of breast cancer. But the startling finding was that this was found in sixteen different types of cancer.

This ties in with other studies, and suggests that these enzymes, when inappropriately switched on, might be a far more important cause of cancer than anyone had previously suspected. Finding out what’s switching them on, says Alexandrov, is a hugely important area for future research.

Another signature, found in certain breast, pancreatic and ovarian cancers, bore the hallmarks of exactly the genetic damage likely to be caused when the BRCA1&2 genes are faulty, and thus the team are fairly confident they’ve identified the culprit here.

As expected, the hallmarks of UV damage was found in melanoma and head and neck cancers, while tobacco-related damage was spotted in lung, head and neck, and liver cancers.

Other signatures found in several cancer types, which look like the result of faulty biological processes such as DNA repair (linked to several cancer types), DNA transcription (the first step in the production of proteins; linked to bowel and uterine cancers) and deliberate genetic rearrangements in the immune system (linked to some types of leukaemia and lymphoma, as you might expect).

Finally, they spotted a unique signature in patients that had been treated with the cancer drug temozolomide.

How does this help patients?

Understanding which processes are at play in different cancers is a great step forward, and allows researchers to home in on newer ways to prevent and treat the disease.

A crucial next step for the researchers, Alexandrov told us, is to work out how cancers bearing different signatures behave. “That’s the really important thing. How do these signatures affect clinical outcomes?” he said. The researchers are now starting to look at exactly this, but it requires the genetic data to be linked to clinical data, and this can be hard to come by.

“We also want to look at more cancer types, including rarer cancers. But also, we really need to find out what the other signatures are – what do they do? This paper is a roadmap for future research, and I hope by publishing it we encourage more labs around the world to collaborate and share their data.”

But the technique could have wider implications too – as demonstrated by a second piece of research that came out last week, looking at a carcinogen called aristolochic acid.


Plants of the Aristolochia family can cause cancer

Aristolochic acid is a chemical found naturally in plants of the genus Aristolochia, some of which are used in traditional Chinese medicine. It was found to cause a certain form of rare cancer called urinary tract urothelial carcinoma (UTUC), and herbs containing it were banned in the early 2000s.

Researchers in Singapore obtained tumour samples from nine patients with UTUC, and looked for genetic signatures in a very similar way to the team at the Sanger.

They found an extraordinary degree of damage in the samples – far in excess of the level found in smokers or UV-exposed melanoma patients – and a unique A→T change in a very specific context. They confirmed in the lab that this was exactly the type of damage caused by aristolochic acid.

But then they looked at DNA data from nearly 100 liver cancer patients. Startlingly, they spotted the hallmarks of aristolochic acid damage in eleven of them. This suggested that the chemical, or one like it, can also cause liver cancer – something no-one had previously suspected.

These researchers now say that restrictions over the sale of these herbs need tightening up and properly monitoring. But it also shows that the technique, of looking for signatures from a particular carcinogen across different types of cancer, can lead to new insights and ways to prevent the disease on a societal and regulatory level, as well as a scientific and medical one.

The urge to know what caused an individual’s cancer is understandable, and an entirely natural response to a disease as fearful as cancer. At the moment, for most of us the answer is, perhaps thankfully, unknown and unknowable.

But research into the genetic make-up of different tumours is leading to an extraordinary acceleration in the pace of research and – we hope – towards new ways to improve things for people affected by this terrible disease.



  • Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A.J.R., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A. & Børresen-Dale A.L. & (2013). Signatures of mutational processes in human cancer, Nature, DOI:

Images via Wikimedia Commons: DNA structure, DNA composition, Fingerprint and Aristolochia