Skip to main content

Together we are beating cancer

Donate now
  • Science & Technology

Everything you really need to know about DNA sequencing

by Misha Gajewski | Analysis

25 April 2016

1 comment 1 comment

DNA code

It’s becoming increasingly clear cancer is a disease rooted in our genes. But figuring out how our genes control cancer is a mystery that is still very much unresolved.

However, recent advances in DNA sequencing technology are helping us piece the clues together – and the more we find out, the closer we get to better treatments.

So how exactly does that technology work, and how is it helping? To answer that question we have to go back to where it all began.

The birth of genetics

It all started with some peas.

Back in the 1800s, Gregor Mendel, a friar in Austria, was inspired to look at how pea plants inherited different traits.

Over the course of seven years, Mendel cultivated and tested more than 29,000 pea plants, figuring out the fundamental principles of heredity we now know as ‘Mendel’s laws’.

But while Mendel could show how different traits tended to be inherited down the generations, the actual physical mechanism by which this happened was a complete mystery.

Fast forward a century or so, when Francis Crick and James Watson (with a lot of help from Maurice Wilkins and Rosalind Franklin) figured out that mechanism, by deciphering the 3D structure of DNA, which encodes the genetic information in all our cells.

From this, scientists have built up a detailed picture of how our genes work within our cells, acting as biological ‘recipes’ for the molecules that build our bodies and keep us alive.

Humans have around 20,000 genes inside their cells, and they control (or at least influence) virtually all aspects of life and health. But while it’s easy to draw links between certain damaged genes and the diseases they cause– such as the gene CFTR, which causes cystic fibrosis in children who inherit two faulty versions of it – in most cases, things are a lot more complicated.

Figuring out which gene controls what, especially when it comes to complex diseases like cancer, is a huge challenge.

So where do we start?

Well, similar to figuring out how a recipe is going to turn out, you need to look at the raw ingredients. And in the case of genes, it’s our DNA.

From peas to ATGC

DNA is made up of long strings of four different chemicals (also known as bases, or ‘letters’): adenine (A), guanine (G), cytosine (C) and thymine (T). These are strung together in pairs (A with T, C with G) to form the famous ladder-like double helix.

The structure of DNA

These four letters are the alphabet in which the genetic recipes of life are written. So to figure out how our genes work, we first need to ‘read’ their order – a process known as DNA sequencing.

The first methods for doing this were first worked out back in the 70s. The first, known as chain-termination, was developed in the UK 1977 by Fred Sanger. Then, two years later, two US scientists, Allan Maxam and Walter Gilbert, invented a second method.

In the end, Sanger’s method was simpler, quicker and safer, so it prevailed until the mid ‘00s. But despite being quicker and simpler than the Maxam-Gilbert technique, sequencing DNA the Sanger way was no easy task.

“You had to do everything by hand,” says Dr Kathy Weston, a former Cancer Research UK researcher, who circumstantially became (unofficially) the world’s fastest DNA sequencer thanks to the advanced technology in her lab during her PhD.

“It was a huge effort to sequence even tiny things,” she continues, explaining the many tedious steps involved in reading DNA back then. Back then it took her an entire year to manually sequence 50 kilo bases (50,000 bases) – a task that now takes around 1/100th of a second with today’s automated technology.

So how exactly did she – and all of the other early DNA sequencing experts – do it?

Sanger sequencing made simple


The original Sanger method is beautifully simple, yet extremely laborious. First, the researcher isolates DNA from whatever cells they’re studying – be they human, animal, bacterium or plant. This acts as a ‘template’ (i.e. the DNA ‘recipe’ you want to read).

Then there are a few more ingredients to add to the mix:

  • A bit of DNA called a primer, which tells you where to start reading (like the capitalised word at the start of a sentence).
  • An enzyme called DNA polymerase that makes an exact copy of DNA from the template.
  • Plenty of normal bases (known as dNTPs) – the As, Cs, Ts and Gs that make up DNA – that the DNA polymerase uses to make new DNA. A– the chemical
  • Modified bases (ddNTPs), which act as ‘full stops’, halting the DNA polymerase in its tracks. These can also be A, C, T or G.

Next, the template, primer, polymerase and normal bases are mixed together in a test tube, along with either A, C, T or G ddNTPs. This mixture is then gently warmed to the right temperature for DNA polymerase to work at, and it gets busy making new strands of DNA from the template, using the normal bases. But at random points the polymerase puts in a ddNTP, which stops the strand from growing any further.

The researcher then repeats this process many times with the same ddNTP, and eventually ending up with a test tube full of different length DNA strands, all ending at a different point (but the same ‘letter’, depending on which ddNTP was used).

Then, they repeat it again three more times with each remaining type of ddNTP, and now they have everything you need to ‘read’ the sequence.

To do this, the researcher makes use of the fact that DNA molecules carry a negative electrical change. They place the four reaction mixtures in separate channels along one end of a slab of squidgy gel. An electric current is run through it, sending the strands of DNA on the move towards the other end (a technique called gel electrophoresis). The shorter DNA strands travel furthest, while the longer ones can’t get very far since they get held up in the gel.

After some processing to reveal the locations of the strands within the gel, the result looks vaguely like a multiple-choice test answer sheet. This is ‘read’ from the bottom up, checking each ‘letter’ in turn to get the order of the DNA sequence.


The copyright to this image is retained by John Schmidt (JWSchmidt) CC-BY-3.0

“It was very boring to do but it generated a huge amount of data. It really was the start of something amazing,” said Dr Weston.

Now scientists had a way to read DNA in normal cells, they could use the same technique to identify changes in genes associated with diseases like cancer.

And it wasn’t long before they started finding things. For example, in 1983 they found the Huntington’s disease gene and a few years later in 1989 they found the mutation responsible for cystic fibrosis.

Moreover through understanding these defects, it raised the idea that they could potentially make new drugs targeting those changes, or the faulty molecules they produce.

But many researchers believed the crucial next step in this new approach to medicine was to undertake what was considered, at the time, a Herculean task: using this slow, arduous technique read the entire human genome, made up of a sequence of around three billion ‘letters’.

This was vast compared to previous successes – around 4.6 million for bacteria or a hundred million in tiny nematode worms.

But not all scientists saw the point of this exercise.

“Many thought ‘that’s completely stupid’. They didn’t think there was anything more they could learn,” Kathy Weston recalls.

Others figured DNA sequencing was so inefficient that it wasn’t even worth trying to scale it up to the level of the human genome

However, by the late 80s the sequencing process was slowly starting to become automated, making it quicker, safer and cheaper. So the view among scientists began to shift from ‘why bother?’ to ‘why not?’

And it was in fact one of our very own, Sir Walter Bodmer, who was head of Imperial Cancer Research Fund (ICRF) which later became Cancer Research UK, who originally proposed the idea for the Human Genome Project.

Moving towards the human genome

It took three years of scientific meetings, international wrangling and political lobbying to persuade the US government to fund The Human Genome Project, with further funding coming in from other organisations like the UK’s Medical Research Council and the Wellcome Trust.

Teams in the US, UK, Japan, France, Germany and China all pulled together with one simple goal: to create a “complete mapping and understanding of all the genes of human beings.”

The project officially got underway in 1990, and was estimated to take 15 years to read the whole thing. But there was one big roadblock standing in the way: technology.

“The rate at which DNA can be sequenced will not be sufficient for sequencing the whole genome,” wrote Francis Collins, the project leader, in the 1993 five-year plan. And the technology required for cost-effective large-scale sequencing was still years away.

So when the project’s scientists officially started sequencing on 11th April, 1996 they were still doing it by hand, and progress was painfully slow, managing about 15 per cent of the human genome by 1999.

Despite the crawling speed, exciting findings were starting to turn up. In 1996, a gene linked to Parkinson’s was the first such disease gene to emerge. And the BRCA 1 and 2 genes, associated with an increased risk of breast, ovarian and prostate cancer, were found the next year.

The race to crack the code of life

Some researchers, however, were dissatisfied with this snail’s paced progress. Chief among them was biochemist and entrepreneur Craig Venter, who claimed he would sequence the human genome in just two years.

How? With money, technology and some short cuts.

The first decision Venter made was to eschew the time-consuming Sanger technique, opting instead for a new method called whole-genome shotgun sequencing. Through his company, Celera Genomics, he bought a fleet of shiny new sequencing machines and set to work.

Traditional Sanger sequencing starts from known stretches DNA. By using primers matching specific sequences, researchers can ‘read’ along a strand of DNA for a few hundred bases. Then they design new primers based on the end of the sequence they’ve just got to and repeat the process. This generates overlapping sequences, effectively ‘walking’ along long stretches of DNA.

Although based on Sanger’s original technique, Venter’s favoured Shotgun method starts by blasting the entire genome of an organism into tiny pieces and reading it all at once. These multiple sequences are then assembled by advanced computer software that puts everything back together looking for overlapping bits.

Venter’s method wasn’t perfectly accurate, but it was far quicker than the efforts of Collins and his team – and the race between the public and private sector was officially on.


The final sprint

Collins didn’t want to completely ditch the traditional Sanger method. Not only was it was more reliable, because his team was sequencing DNA fragments that they already knew the location of, it was much simpler to put the whole picture of the genome back together.

But knowing he had to do something to speed things up, he also bought a fleet of fancy shotgun machines. Soon the number of letters read jumped from millions to hundreds of millions in a matter of months.

On June 22, 2000 the race ended, with the publicly funded Human Genome Project in first place. Venter came in a close second three days later, on June 25. And on June 26 President Bill Clinton announced that the “wonderful map” of our human genome (or at least a first draft of it) was complete.

Light speed ahead

There have been major advances in sequencing technology since then, and the price has dropped over 100-fold.

While some of this progress relied on improvements to the Sanger method, the real game changer was the emergence in 2005 of so-called ‘next-generation’ sequencing (NGS) – a catchall term for any machine not doing sequencing the Sanger way.

“The setup up was incredible,” reminisced Dr James Hadfield, head of Cancer Research UK’s genomics lab at Cambridge University.

“We went from sequencing a few pieces of DNA in a day, to sequencing millions. Genomes became a dream, then rapidly a reality for anyone to consider sequencing.”

It started with a system developed by the company 454 Life Sciences, which was used to read the entire genome of the bacterium Mycoplasma genitalium in one go.

But different methods began popping up almost simultaneously. Over the last few years they’ve battled it out, but it was a technique devised by two scientists in Cambridge that came out on top.

In 2006, two researchers, Shankar Balasubramanian – now at our Cambridge Institute – and David Klenerman developed their own sequencing technique called sequencing-by-synthesis, which is now widely used in machines like those made by the company Illumina.


It works by taking DNA from a sample of cells and smashing it up, creating a library of the genome made from short overlapping fragments. Next, each piece then gets special short pieces of DNA stuck to both ends, known as adapters.

These fulfil two purposes. Firstly, the adapters work as a kind of molecular ‘Blu Tack’, sticking all the DNA fragments down onto a glass slide. And they also act as a starting point for a process called the polymerase chain reaction, or PCR, which copies each fragment of DNA many times while it’s stuck in place. This creates tiny clusters of identical sequences arrayed on the slide.

Then there’s a second round of copying, using DNA bases tagged with fluorescent dyes – a different one for each of the four ‘letters’, A, C, T and G. As each letter is incorporated into the growing strands of DNA, the slide is illuminated with a pulse of laser light, creating a flash of colour.

By capturing the order of the flashes coming from each cluster of DNA strands on the slide, researchers can figure out the sequence of the original genomic fragment it came from: for example, yellow, red, green then blue might correspond with A, C, T then G.

The final step is to figure out the original DNA sequence, based on all this data from the small overlapping fragments. Imagine you have different pieces of a sentence:

  • “From so simple a beginning
  • beautiful and most wonderful have been
  • been, and are being, evolved.”
  • beginning endless forms most beautiful

By lining up the overlapping words (in bold), the whole quote – from Darwin’s On the Origin of Species – becomes clear: “From so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.”


Looking to the future

Illumina machines are widely used around the world, churning out billions of bases every day. And they’ve revolutionised our understanding of human disease, including cancer, but there are even more exciting developments on the horizon.

Chief among these is the invention of so-called ‘nanopore’ technology, which is still relatively new, but tipped to become the next big thing in the sequencing world.

It’s based on the idea of squeezing a strand of DNA through a tiny hole in a membrane, with each base popping through like a bead on a string. By measuring the change in electrical conductance as each different ‘letter’ moves through the hole, it’s possible to directly read the sequence of that strand.

As technology continues to advance and becomes more powerful and affordable, the future of DNA sequencing is full of possibilities James Hadfield believes the future lies in personalised sequencing.

“The hope is that the cost gets to a point that [anyone] can consider sequencing their own genomes – probably via their GPs – and that the data is easily interpreted to allow it to inform medical decisions,” he says.

Along the same lines Clive Brown, from Oxford Nanopore Technology, thinks the future is fully scalable devices that could be put into, say, your toothbrush and would track your daily health.

“The big dream,” he told the audience at the Wired Health 2015 conference, “is a move towards self-quantification, daily tracking the presence of markers and changes in daily biology to preventably change the way people live”.

Although, in order to get to the point where the visions of Dr Hadfield and Brown are a reality the data produced by sequencing will need to be cheap and quick to analyse as well as easily interpretable.

The genome era of cancer research

We’ve come a long way from Kathy Weston and her colleagues, painstakingly tracing out a few hundred letters of DNA, at a time like a child following the words in a picture book with a tentative finger.

Today, our researchers and others around the world are sequencing the genomes of thousands of people and tumour samples, uncovering a wealth of data about the genetic changes that underpin cancer.

For example, our scientists are using advanced sequencing techniques to track how lung cancers evolve and change over time within each individual patient.

And we’re also starting to see genetic testing come into the clinical trials for different types of cancer, informing doctors about which drugs are most likely to work for which person.

Another application that’s showing promise is reading the DNA shed by tumours into the bloodstream. This could become a powerful way to non-invasively diagnose and monitor cancer in the future.

Finally, genetic knowledge could be the key to guiding potent immunotherapies with the potential to bring new cures.

Whatever comes next, there’s no doubt that the future of DNA sequencing will be as transformative as its past.



  • Azmina Verjee
    26 April 2016

    What a wonderful whistle-stop tour of DNA sequencing! Thank you Misha and thank you Cancer Research UK for this brilliant blog.


  • Azmina Verjee
    26 April 2016

    What a wonderful whistle-stop tour of DNA sequencing! Thank you Misha and thank you Cancer Research UK for this brilliant blog.