The genome as a repository of information: how and why to artificially store data in DNA

What information is in DNA

DNA is a sequence of nucleotides. There are only four of them:

  • adenine,
  • guanine,
  • thymine,
  • cytosine.

To encode information, each of them is assigned a digit-code. For example, thymine - 0, guanine - 1, adenine - 2, cytosine - 3.

The sequence of nucleotides allows"Encode" information about different types of RNA. All these types of RNA are synthesized on a DNA template by copying the DNA sequence into the RNA sequence synthesized during the transcription process, and are involved in protein biosynthesis (translation process).

In addition to coding sequences, cell DNAcontains sequences that perform regulatory and structural functions. In addition, the genome of eukaryotes often contains regions belonging to "genetic parasites", for example, transposons.

Coding begins with the fact that all letters,numbers and images are converted into a binary code, that is, a sequence of zeros and ones, and they are already converted into a sequence of nucleotides, that is, a quaternary code.

There are many ways to read DNA.The most common technique is that a chain of DNA molecules is copied using bases, each of which has a color mark. Then a very sensitive detector reads the data, and the computer uses the colors to reconstruct the nucleotide sequence.

How new information appears in DNA

This is done using CRISPR-Cas9 technology, also called genetic scissors. It was developed eight years ago and was awarded the Nobel Prize in Chemistry in 2020.

Previously, it was necessary to record information for a long time and with the help of special equipment. However, a team of scientists from Columbia University has automated this process.

We managed to teach cells to talk to a computer through electronic signals and thus download information from any electronic medium.

Harris Wang, Professor of Systems Biology

The authors explain that they are translating binarya computer program into electrical impulses that are sent to the cell. On its surface there are receptors that perceive these signals and already translate them into the language of DNA, automatically building the required genome sequence.

As a result, the following is added to the DNA chaincalled trailer or extra piece. Unlike digital computer information, it is a set of letters of the genetic code, that is, an analog cipher, so the scientist compares this segment with a magnetic tape.

Interaction of the transcription factor STAT3 with DNA (shown as a blue helix)

How much information can be recorded in DNA

With the help of new technology employeesColumbia University was able to encode and read 2.14 megabytes of information. The final physical recording density was 215 million gigabytes per gram of nucleic acid.

One turn of a DNA helix in B-form is about 10 base pairs. One of the threads will be coding. the second is always complementary to the first.

Thus, there are 10 cells, each of which can contain one of four letters:

  • BUT,
  • T,
  • G,
  • Ts.

When using a quaternary or binarycoding density of coding information in DNA is two bits per cell, that is, 20 bits per one revolution of the spiral, the linear size of which is approximately 3.4 nanometers with a volume of ~ 11 cubic nanometers - this is what can be written down.

Today it is possible to create processors in which 1 bit is written at 10 nanometers. Thus, in DNA, based on the linear dimensions, it is possible to record about 60 times more information.

How reliable is it to record information on DNA

In March 2017, Science published an article by American scientists who managed to write 2 * 1017 bytes per gram of DNA. Biologists emphasize that they have not lost a single byte.

The undoubted advantages of recording information on DNA include the enormous storage density of data, as well as the stability of the carrier - albeit only at low temperatures.

In DNA, information is recorded in a three-dimensional analogform, and this is the most stable form. In this form, data can be stored for hundreds of thousands, if not millions of years, said professor of systems biology Harris Wang.


Despite all the advantages, the recording technologyinformation on DNA is at the initial stage of its development. Today, DNA synthesis is still very expensive, so for a megabyte of data recorded on a DNA "flash drive" you will have to pay about 3.5 thousand dollars.

Scientists have yet to develop technologyautomatic transmission of information from DNA. It is also important to simplify the way information is transferred from the computer to the cell. Now it uses a stream of electrons, but in the future it can be replaced by something else.

For example, an alternating magnetic field or ambient temperature. Or even an ordinary ray of light - after all, most living organisms have photoreceptors.

Read more:

Giant iceberg A74 collides with the coast of Antarctica

Fish with human teeth found in the United States

Wild ticks will be specially released in Russia for pest control