Learn more about Russian war crimes in Ukraine.

Why does this RNA virus look like DNA?

The other day, I was diffing coronaviruses: taking long strings of GCAT characters that make up the COVID-19 genome, and pairing them up with the GCAT characters of similar viruses. I didn’t realize it at the time, but there’s an oddity here. Coronavirus is not made of DNA; it’s made of RNA. RNA looks like GCAU, with Uracil instead of Thymine. But these files are full of GCAT, like DNA. What was going on here? Was the genome sequence lying to me, or does COVID-19 really contain DNA, rather than RNA?

The answer, it turned out, is that the sequence is a lie! We must replace all Ts with Us to get a faithful sequence of the coronavirus RNA. So, why is it represented this way?

The reason is that this is a sequencing of DNA which was generated from the original RNA. Apparently, nearly all RNA sequencing is done this away, because tooling for DNA sequencing is cheaper and more mature, and DNA is more stable than RNA. This process uses a reverse transcriptase to convert the RNA to DNA. More precisely, this creates a complementary DNA, or “cDNA”.

I was able to answer this with the help of the Bioinformatics Stack Exchange.

What can computers do? What are the limits of mathematics? And just how busy can a busy beaver be? This year, I’m writing Busy Beavers, a unique interactive book on computability theory. You and I will take a practical and modern approach to answering these questions — or at least learning why some questions are unanswerable!

It’s only $19, and you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

After months of secret toil, I and Andrew Carr released Everyday Data Science, a unique interactive online course! You’ll make the perfect glass of lemonade using Thompson sampling. You’ll lose weight with differential equations. And you might just qualify for the Olympics with a bit of statistics!

It’s $29, but you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

More by Jim

Tagged #programming, #bioinformatics. All content copyright James Fisher 2020. This post is not associated with my employer. Found an error? Edit this page.