The genome as a repository of information: how and why to artificially store data in DNA

What information is in DNA

DNA is a sequence of nucleotides. There are only four of them:

  • adenine,
  • guanine,
  • Timin,
  • cytosine.

To encode information, each of them is assigned a digit-code. For example, thymine - 0, guanine - 1, adenine - 2, cytosine - 3.

The nucleotide sequence allows“encode” information about different types of RNA. All these types of RNA are synthesized on a DNA template by copying a DNA sequence into an RNA sequence synthesized during the transcription process, and take part in protein biosynthesis (translation process).

In addition to coding sequences, cell DNAcontains sequences that perform regulatory and structural functions. In addition, the genome of eukaryotes often contains regions belonging to "genetic parasites", for example, transposons.

Coding begins with the fact that all letters,numbers and images are converted into a binary code, that is, a sequence of zeros and ones, and they are already converted into a sequence of nucleotides, that is, a quaternary code.

There are many ways to read DNA.The most common technique is that a chain of DNA molecules is copied using bases, each of which has a color mark. Then a very sensitive detector reads the data, and the computer uses the colors to reconstruct the nucleotide sequence.

How new information appears in DNA

This is done using CRISPR-Cas9 technology, also called genetic scissors. It was developed eight years ago and was awarded the 2020 Nobel Prize in Chemistry.

Previously, it was necessary to record information for a long time and with the help of special equipment. However, a team of scientists from Columbia University has automated this process.

We managed to teach cells to talk to a computer through electronic signals and thus download information from any electronic medium.

Harris Wang, professor of systems biology

The authors explain that they are translating binarya computer program into electrical impulses that are sent to the cell. On its surface there are receptors that perceive these signals and already translate them into the language of DNA, automatically building the required genome sequence.

As a result, the following is added to the DNA chaincalled trailer or extra piece. Unlike digital computer information, it is a set of letters of the genetic code, that is, an analog cipher, so the scientist compares this segment with a magnetic tape.

Interaction of the transcription factor STAT3 with DNA (shown as a blue helix)

How much information can be recorded in DNA

Using new employee technologyColumbia University was able to encode and read 2.14 megabytes of information. The final physical recording density was 215,000,000 gigabytes per gram of nucleic acid.

One turn of the B-form DNA helix is ​​approximately 10 base pairs. One of the threads will be encoding because the second is always complimentary to the first.

Thus there are 10 cells, each of which can contain one of four letters:

  • A,
  • T,
  • G,
  • Ts.

When using quaternary or binarycoding density of information coding in DNA is two bits per cell, that is, 20 bits per turn of the helix, the linear size of which is approximately 3.4 nanometers in volume  ~11 cubic nanometers is what can be written down. 

Today it is possible to create processors in which 1 bit is written at 10 nanometers. Thus, in DNA, based on the linear dimensions, it is possible to record about 60 times more information.

How reliable is it to record information on DNA

In March 2017, Science magazine published an article by American scientists who managed to record 2*1017 bytes per gram of DNA. Biologists emphasize that they have not lost a single byte. 

The undoubted advantages of recording information on DNA include the enormous storage density of data, as well as the stability of the carrier - albeit only at low temperatures.

In DNA, information is recorded in three-dimensional analogueform, and this is the most stable form. In this form, data can be stored for hundreds of thousands, or even millions of years, said Professor of Systems Biology Harris Wang

Conclusion

Despite all the advantages, recording technologyinformation on DNA is at the initial stage of its development. Today, DNA synthesis remains very expensive, so for a megabyte of data recorded on a DNA “flash drive” you will have to pay about 3.5 thousand dollars.

Ученым еще предстоит разработать технологию automatic transfer of information from DNA. It is also important to simplify the way information is transferred from the computer to the cell. Currently, this uses a stream of electrons, but in the future it could be replaced by something else.

For example, an alternating magnetic field or ambient temperature. Or even an ordinary ray of light - after all, most living organisms have photoreceptors.

Read more:

Giant iceberg A74 collides with the coast of Antarctica

Fish with human teeth found in the United States

Wild ticks will be specially released in Russia for pest control