On the Origin of a virus

Skepticism plays a crucial role in science; different possibilities or speculations should be investigated, results obtained must be reviewed by experts (nowadays discussions happen in multiple platforms, journal websites, media/social media platforms), and evaluated based on scientific evidence presented. Science cannot accept any speculations blindly, no matter how attractive it is, or how big the name of the scientist who proposes it. Any new hypothesis should go through apparent rigorous review processes, and only with strong supporting evidence, science endorses it. A person in science knows the routine, but it is hard for someone outside of science to understand when scientists argue, or proposes alternative hypotheses, and seek for further evidence. Science cannot ignore any ‘extremely-unlikely’ scenario, until there are scientific proofs to conclude so. ´Questions asked´ by skeptics are understandable to the public- which may ignite their skepticism and easily jump into wrong conclusions without searching deep for evidence. Difficult part is understanding the answers to those questions! The listeners must need patience and basic knowledge on the subject, sometimes a bit more than the fundamentals to understand the language Scientists speak to clarify their genuine doubts. Let’s dive into the current speculations and ‘explosive’ evidence on various origin hypotheses.

Why the Origin of this pandemic virus matters?

This pandemic devastated world’s economy, affected lives of billions, killed millions, and still counting. It is very vital to fathom the origin of this killer virus ‘SARS CoV2’- the culprit of catastrophes. Pandemics are rare events that give us invaluable opportunities to figure out critical flaws and help us to develop precautionary steps to avoid such incidents in future. For that very reason, to understand the origin of this pandemic is extremely vital- where does this virus come fromNatural-origin or lab-origin, the answer will aid us to compose corrective steps. Suppose we could scientifically prove a natural spillover event from bats/intermediate host as the origin, that will benefit us in creating strict measures to prevent such spillovers in future (natural spillovers occurred even in recent pasts). Conversely, if we have the scientific evidence that this virus leaked from a research lab (accidental or deliberate), discovering that is very crucial to strengthen safety measures to avoid any such incidents. 

WHO phase-I report

A concrete answer on the origin question is always great, however, due to scientific and political reasons it is not going to be a cakewalk. A team of multidisciplinary experts (17 from China and 17 from other countries) visited and conducted a joint study over a 28-day period from Jan 14th to Feb10th 2021, in the city of Wuhan, China and created an elaborate 120-pages report (plus extensive annexure). First comprehend that WHO (World Health Organization) team was not there for any forensic investigations or to reanalyze samples or interrogating researchers. Phase-I of the joint study reviewed available epidemiological, environmental genomic data, molecular & bioinformatics data, and reached a consensus report and set the stage for follow up Phase-II study. WHO team enumerated different possible origin scenarios and graded them in ‘most likely to less likely’ scenarios. Further, WHO team visited Wuhan Institute of Virology (WIV) and discussed on their research activities, interviewed lab personnel and examined investigations conducted by Chinese authorities on lab-origin scenario. It is very naive to presume that WHO team should have barged in to inspect lab freezers and databases as a part of investigation.  WHO report proposed that future collaborative efforts and detailed investigations only will shed light to the events that led to this pandemic.

The possible origin scenarios

WHO report discusses the 4 possible scenarios (1-4). They argued in favor and against each of the scenarios and assessed the likelihood of each. 

  1. Natural origin– direct zoonotic transmission from a host animal reservoir to human population, like Influenza, Nipah, HIV etc. Natural precursor virus from an animal reservoir spilled and infected someone who ignited the pandemic fire. WHO team’s assessment: a possible-to-likely pathway.
  2. Natural origin through intermediate host- virus jumped from animal reservoir to another animal, adapted in the latter and finally jumped to human populationlike SARS or MERS. WHO team’s assessment: a likely to very likely pathway.
  3. Natural origin through frozen food– carcass of an infected animal was stored as foods which could be the vehicle for transmission from infected animal to human population. WHO team’s assessment: a possible pathway. 
  4. Laboratory origin– Virus introduced to human population through a lab incident. Accidental infection of a lab personnel or deliberate engineering/release of the virus. WHO team’s assessment: an extremely unlikely pathway.

Though the scenario 4 is ‘extremely unlikely pathway’ from their assessment, I would like to elaborate more on scenario 4 through possibilities sub-scenarios 4a-4d. Note there are no evidence so far on these sub-scenarios, my effort is to list possible lab-origin plots.

4a. Natural virus/Accidental leak– a person in a Corona virus research lab gets infected and accidently propagates the virus to human population.

4b. Natural virus/Deliberate release– a person in a Corona virus research lab deliberately releases virus to human population.

4c. Engineered virus/Accidental leak– a person in a research lab that studies engineered Corona viruses got infected with the virus and accidently propagates it to human population.

4d. Engineered virus/Deliberate release– a person in a research lab that studies engineered Corona viruses releases the virus to human population.

Why the natural origin is a ´possible-to-likely´ scenario while laboratory origin is an ´extremely-unlikely´ scenario?

A deliberate release of a deadly virus into own population seems illogical (WHO report did not consider such a plot), because a virus can mutate into more virulent variants as we see now and comprise havocs to the creators themselves. While spillovers of viruses from its natural reservoirs or through intermediate hosts, subsequently causing outbreaks are not at all bizarre. Such events happened several times in the past for many viral diseases, for e.g., Ebola, SARS, MERS, Influenza, Nipah, HIV etc. At the right circumstances, a ready-to-go precursor virus could easily ignite a local outbreak and evolve into a pandemic. Spillovers can happen from infected wild animals in a wet market, or from an animal farm, or from a bat cave. Patient zero could easily be a wet market employee, a buyer, a caretaker at wildlife farm, a farmer who collects bat guano or a researcher who collects research sample from a bat cave, from a zoonotic transfer scenario. Though accidental lab-origin (or lab-leak) hypothesis is based on conjectures with no scientific evidence so far and WHO graded it as an ‘extremely unlikely’ scenario, it is important to investigate further on lab-leak hypothesis also.

An argument against natural origin hypothesis is the delay in identifying the natural host of the precursor virus for SARS CoV2. Let’s discuss that point in detail, why is it difficult to figure out SARS CoV2 precursor virus? It is true that, after SARS and MERS outbreak, scientists could quickly identify the reservoir host. Having said that, a quick reservoir identification cannot be generalized for all the zoonotic outbreaks (Ebola, HIV for example are in conflict). During SARS1, sample collections from infected animals in markets were possible and scientists could figure out civets as intermediate hosts in 4 months. Though later it took more than a decade to identify some of the precursor bat viruses that spilled into civets. For MERS, identifying camels as reservoir host was also faster due to quick sample collections from infected animals. In both those earlier Corona virus outbreaks, yes, retrieving the samples from the suspected places happened quickly. But, situation was different in Wuhan. Due to initial speculations that Wuhan seafood & wildlife market was the epicenter of SARS CoV2 outbreak, authorities ordered to sanitize the market to prevent any further spread of the virus, as per protocolThis made it difficult to collect samples from remaining animals in Wuhan markets and lost a golden opportunity to rapidly discovering the natural reservoir link to SARS CoV2. Later, the presence of SARS CoV2 was detected from environmental samples from drainages of Huanan market. However, chances to identify any infected host reservoir were missed, due to sanitation efforts of Wuhan market places. WHO team reports the detection of virus in environmental samples and presence of susceptible hosts in the Huanan market. A direction now left is to trace legal/illegal animal farms that provided animals to Wuhan markets, collect samples from those wildlife farms for identifying any precursor viruses and perform thorough epidemiological analyses on the animal caretakers. Further, China made more strict regulations to curtail wildlife farms which may pose more difficulties to identify the intermediate host from farms that provided those exotic animals to markets. Golden chances for identifying the reservoir becomes feeble as time goes by. Rapid follow-up analyses are warranted to pinpoint the real epicenter of this spillover. WHO team proposed extensive and timely collaborative follow-up Phase-II investigation to solve this origin conundrum.

The tale of 2 lineages

According to WHO’s report, 2 lineages of SARS CoV2 were found in initial cases from Wuhan markets. What is a lineage? When virus multiplies mutations occur in its genome, such mutations can create a variant. Scientists track the mutations in viral genome as they are passed down through a ´lineage´ study, which can be compared to a branch of the viral family tree. The report says two lineages A and B circulated during November/December 2019 in Wuhan. Note, currently the predominant one that sweeps through the globe is the lineage B (yes, B in B1.1.7 or in B1.617 variants). There are 3 nucleotide differences between lineage A and B, and it is assumed that A is ancestral to B, when you compare with the bat CoVs. WHO team looked at the genomic data from earlier cases in Wuhan and reported that Huanan market cases are from lineage B. Interestingly, lineage A mediated cases were from other Wuhan markets. Based on the samples analyzed we can currently conclude that cases in Huanan seafood wet market are mediated by lineage B SARS CoV2 virus, which is down in viral family tree. While cases in other markets in Wuhan are caused by lineage A, which further up in the tree than B. More interestingly, genomic analyses of the cases in Washington, by WA1 variant belongs to the lineage A. Proposing that the first Washington case might have contracted lineage A virus that propagated from those other markets. 

Lineage tracing studies direct towards the possibility of natural origin scenario through infected market animals. Infected animal reservoirs (infected with different lineages) might have reached a hub in Wuhan and routed to different markets. One lineage becoming more prevalent in one market could be explained by this scenario. Lab-origin hypothesis does not fit very well with the lineage tracing data. An individual spreading lineage A at one place and same person spreading lineage B in another market seems an improbable plot. While another scenario, where an infected person spreading lineage A in one market, the virus propagates mildly/asymptomatically for a while and modifies further into lineage B, dominates in another location seems plausible. But how the infected person got the virus to begin this avalanche is the big question? Again, it could be a natural spillover or lab-accident, we don’t have the concrete answer yet.

How easy is designing a zombie virus?

Nowadays, we have cutting-edge technologies to manipulate the viral genome by introducing specific mutations and make genetically modified viruses for research and treatment purposes (to understand the virus, to develop cures and to make life-saving vaccines). Recently, molecular biology has advanced to an extent that we can introduce desired mutations to fragments of foreign genes in any viral genome without leaving any signs of genetic engineering (genetic manipulations). However, our approaches are very modest, compared to nature´s massive approach to create variants. Though we have the ´know-how´, functional prediction of engineered mutations is very hard and may not be very accurate. One must screen a ton of mutants created to identify the one with the expected traits. Hence compared to massive scale evolution and screening approaches by nature through animal reservoirs, our technology is meager. 

Let´s dig into SARS CoV2 genome now. Some regions of SARS CoV2´s spike protein, like the receptor binding domain (RBD) is peculiar when compared to bat CoVs. The RBD of SARS CoV2 is like the pangolin-CoV´s RBD, they both have good ACE2 binding character (the reason initially why we thought that the pangolins could be the intermediate reservoir of the SARS CoV2 virus). While looking at that the regions outside RBD the similarity declined, which lead to the hypothesis that some intelligent designer inserted RBD from pangolin to a bat-CoV precursor backbone to make it more humanized. 

To create a novel SARS CoV2 virus a researcher needs a precursor virus or the sequence of it, like a backbone or a scaffold. The most discussed closely related one from WIV lab was a bat CoV named RaTG13, which is only 96% sequence similar to the original SARS CoV2, first detected in Wuhan. However, one has to change nearly 1200 nucleotide letter changes in RaTg13 to match the SARS CoV2. Only an extraordinary researcher with so far unknown skills can predict, design, and modify the RaTG13 viral genome to create the current circulating version of SARS CoV2 genome. If there were any unpublished precursor viral genomes in WIV freezer stocks is unknown! However, all the features of SARS CoV2 can be found in other similar viruses, and this class of viruses possess peculiar mechanisms like template switching, recombination etc., that can clearly acquire bits and pieces from other similar viruses to create the perfect SARS CoV2 virus. One can speculate a plot that, a distant precursor of SARS CoV2 might have jumped from a bat to an intermediate host (e.g., a pangolin, or a cat or a mink or even a human) and while adapting in new hosts, the precursor could encounter related viruses. During this process of adaptation, precursor’s genome could randomly pick up gene fragments from its cousin viruses. These processes might have taken several years below the radar (i.e., only mild or asymptomatic infections) before it gained all the features that propelled its pursuit as the ´pandemic´ virus. 

Recently, Dr. Shi Zheng-Li´s group in WIV put nearly 8 more SARS-related CoVs obtained from bats. One of the bat viruses, RaTG15 shows 97.2% amino acid similarity with SARS-CoV2, more similar than the previously discussed precursor RaTG13 (interestingly both were from bats found in same location). Intriguingly, this virus binds to bat ACE2 receptor but not human ACE2 receptor! Which proposes, RaTG15 or any such viruses in the WIV lab freezer may not be the direct backbone for SARS CoV2 virus that uses human ACE2 efficiently. So far identified precursor viruses needs several years of evolution in bats or an intermediate host or humans to create current genome version of SARS CoV2.

Is FCS a smoking gun?

Nobel laureate virologist Prof. David Baltimore recently commented that the presence of Furine Cleavage site (FCS) is a smoking gun that hints SARS CoV2 is man-made!

What is FCS? It is a stretch of positively charged amino acids (the building blocks of protein*) where host cell’s protease (Furin- another protein) binds and cuts at this specific stretch in other proteins. The cutting at FCS by the furin plays a crucial role in the entry of SARS CoV2 to the host cell resulting in a successful infection. FCS is not detected in RaTG13 or other SARS related Corona viruses (see figure; green letters are present in RaTG13, red letters are extra sequences in SARS CoV2), which created genuine suspicion even in experts about the deliberate introduction of FCS for gain-of-function research. Portraying that FCS in SARS CoV2 as suspiciously ´unique´ feature is not accurate since the exact site can be seen in other related viruses. FCS is present in other related beta Coronaviruses like MERS, HKU1 and in feline CoVs. Presence of FCS was recently detected in other bat Corona viruses suggesting that acquiring such stretches probably through recombination is possible. Further, presence of O-linked glycan sites near FCS was another feature pointing towards intelligent design, under the notion that these sites are introduced to evade host´s immune system. Later we understood that such sites are present in other CoVs, the one present near the FCS regulate the furin mediated cleavage more efficiently. 

*Spike protein is a string of different amino acids, and the instruction for making that string is coded in the viral genome sequence. Each amino acid bead is coded by a group of 3 nucleotide codons (triplet codons) in the genome and each amino acid can be coded by such multiple triplet codons (redundancy). Nature exploits the opportunity to have random single nucleic acid mutation, sometimes changing the triplet codon of an amino acid and thereby changing the message of the protein. There are several examples where even single mutation results in wrong messages thereby bad proteins creating risk factors or causing detrimental genetic diseases (e.g. sickle cell anemia).

Figure legend: The gene sequence corresponding to FCS ‘RRAR’ is shown (blue box). Those 12 nucleotide letters are missing in RaTG13, an ancestral bat CoV. Note that the 12-nucleotide fragment is not in frame (arguing that an experienced molecular biologist may not design this). This resulted is an extra P in the insert, creating SPRRAR that becomes a furin cleavage site-FCS.

Furin cleavage site that consists of a stretch of positively charged amino acids (RRAR) in the SARS CoV2 spike protein, is very minimal. The FCS stretch in discussion can be created with various permutations and combinations of triplet codons. The exact sequence that is present as an extra stretch in the SARS CoV2 viral genome is ‘CTCCTCGGCGGG’. Such an insertion at this specific locus could be due to various naturally occurring phenomena previously known- like mutations, polymerase slippage, template switching or recombination events that happened between a potential SARS CoV2 precursor and another related virus that had this notorious stretch of nucleotides. Corona virus can undergo such recombination events, and one can speculate that an ancestral virus after attaining an FCS, could adapt further to become the perfect human respiratory virus SARS CoV2 to start the pandemic fire!

Out of frame FCS- an intelligent but bad designer?

If a researcher inserts such a gene fragment coding FCS, one of the basic rules in cloning is the insert should be in-frame with the message coding spike in the backbone viral genome. But here we can see the inserted sequence is not in-frame with the spike´s genetic code while comparing with the RaTG13 sequence. So, if RaTG13 is the backbone for SARS CoV2, the person who designed was so intelligent to change all those over 1kilo base nucleotide sequence, but forgot to put the most important FCS site in-frame! An experienced molecular biologist won’t do that, on the other hand, nature has the freedom to try all permutations and combinations of codes and perform brute force screening which might have resulted in the perfect villain SARS CoV2.

Unusual codons

Presence of unusual codons for aminoacids R and R (RR, the 2-consecutive arginine aminoacids) in ´RR´AR furin cleavage site also raised suspicions. The R (amino acid Arginine) coded by the triplet CGG, is unusual instruction for a viral genome. Such a sequence will invoke immune cell activation and evolutionarily not ideal for a virus getting adapted in a host reservoir. Presence of RR with unusual codons appears like the virus was propagated in a system without an immune system- i.e. dissociated cells in a plate- the argument goes. But if you look closely at the viral genomes, usage of CGG codon for R is not very rare, and is present in FCS of feline CoV genome, and present in other genes in SARS CoV2. Prof. Kristian Anderson explained in his Twitter thread that the CGG codon is rare in SARS CoV2 but that does not mean it is absent. There are other Rs in SARS CoV2 viral genome coded by CGG- nearly 3% of Rs are coded by CGG. So, portraying that presence of CGG as a signature of human intervention is not scientifically accurate. Until we find a precursor virus from an animal host with the FCS and those unusual codons, this issue will remain as smoking gun for conspiracy theorists.

Recently, Dr. Amy Maxmen from Nature journal contacted Prof. David Baltimore on his opinions on Prof. Kristian Anderson´s explanation on the FCS and unusual codons. Baltimore explained, ´´natural origin could well be the scenario, but one cannot completely rule out any other possibilities´´. Which is absolutely reasonable and nothing unusual if someone in science hears, but what public heard was a Nobel laureate endorsing a man-made lab-leak hypothesis.

Safety first

Recently, a Singapore based Australian virologist Dr. Danielle Anderson who was undertaking research at WIV shared her experiences. She was working at WIV while the SARS CoV2 infections were slowly taking off in China. Dr.Anderson was quite impressed by the safety regulations at WIV which were like any other high-containment lab. Revelations from Dr. Shi Zheng-Li her bat-CoV research happened in safety level 2 labs is unacceptable for many of the experts including Prof. Ralph Baric (they were scientific collaborators in CoV research). WIV did not do anything wrong here, bat CoVs can be handled in level 2 facility, according to Chinese safety regulations. This points towards a very important issue about the need of consensus on safety regulations while working with dangerous pathogens (regulations vary country wise!). Current case warrants a universal regulatory body, uniform protocols, recommendations and safety measures on such research areas. 

Early origin?

WHO Phase-I investigation also reports the possibility that SARS CoV2 could have been circulating in human population ‘under the radar’ for months before the first case was recorded. Intriguingly, molecular analyses on sewage water collected on March-12-2019, from Barcelona (famous tourist place in Spain), detected pieces of SARS CoV2 genome. Data is non-peer reviewed and only PCR data (not sequencing), hence take with a grain of salt. Similar early detections of SARS CoV2 were reported from Italy, France and also from the USA, before the first recorded case in those places. If these data are correct, this invisible villain was with us long before and slowly setting the stage for the show. WHO team proposes a retrospective surveillance of stored samples frozen in other countries to identify any clusters ahead of recorded first case of COVID19.

Obfuscating wildlife trade in markets

Illegal wildlife trade and sale happened in Wuhan seafood markets. Along with fish the market was selling and slaughtering wild animals in those markets. WHO-report says ´´no verified reports of live mammals being sold around 2019 were found´´, while a study came out recently that addresses wildlife trade in Wuhan markets with photographic evidence. Serendipitously, the team conducted this study in the context of other diseases and their links to seafood markets. Though there are strict regulations on selling those exotic species in China, implementation seems not very strict. The article mentions the poor welfare of the live animals in cages and some of them had gunshots or trap wounds suggesting wild harvesting. Data shows that nearly 38 terrestrial wild animal species were sold in the market, but importantly, no bat or pangolin species were for sale. However, it is interesting to see the potential SARS CoV2 reservoirs like mink, raccoons in the menu. The new report warrants tracing the suppliers of wild animals to Wuhan markets, genomic analyses of animal samples from the suspected wildlife farms and seroconversion analyses in caretakers, before we miss that boat.

Internet sleuths 

A group of scientific and non-scientific analysts formed a club DRASTIC, to dig out the data deep in several databases giving air to lab-origin hypothesis. In 2003, during SARS1 outbreak Chinese authorities got criticized for not being open to the world in a timely manner. Current pandemic also flared similar accusations on timely reporting, about live animals in markets, viral genomic data base that went offline in September 2019, research on Mojiang lab-miners etc. An Indian based anonymous origin hunter ´theseeker268´ brought up relevant theses from WIV on infected miner’s data. Bloom lab excavated a set of sequence data of early COVID19 cases in China, from a public database which were deleted upon publication. In current circumstances, even usual practices get shadows of doubt. Being transparent in science is important and helps to avoid casting shadows of doubts and trust issues. If someone else digs up those inaccessible data, it nurtures suspicions- even though it does not add anything new to the already known facts.

Intelligence reports

Science is not simple; you must spend time and energy to understand it. For public, a statement from their political hero, or a comment from a Nobel laureate is enough. But Science does not work like that. Prof. Richard Feynman, one of the greatest minds ever lived said ´´It doesn’t make a difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess or what his name isIf it disagrees with experiment, it’s wrong’. If there is no scientific evidence to prove, it remains as conjecture! Former secretary of state, Mr. Mike Pompeo first claimed about an intelligence report that proves the lab-leak theory. According to him ‘’there is a pile of evidence 100 feet high’’, though he could not solve the case while he was in power. Mr. Pompeo should have given such evidence to WHO team to investigate further while they were in China. Recently, the intelligence report came again under spotlight on the matter of 3 WIV researchers who showed COVID or flu like symptoms in November 2019 and visited hospital. As per the information provided by WIV to WHO, the serological data on their employees shows no COVID19 infections before December 2019. Let’s assume that we may find that some of the WIV personnel have visited hospitals in November 2019, is that enough evidence to prove a lab-leak theory? We need more solid evidence like medical reports, left over blood samples or serological analyses, or further analyses of immune cells to prove the case. US President Mr. Joe Biden has ordered US intelligence officials almost a month ago to investigate and report back to him in 90 days. It will be scientifically very interesting to learn the supportive scientific evidence they provide if it is concluded as lab-leak. The big question is even if the Chinese authorities provide the evidence to liberate the Wuhan Institute of Virology from clutches of suspicions or to support natural origin of SARS CoV2, will the skeptic minds trust those evidence provided? Any explanations or proofs will not satisfy minds tuned to conspiracy theories- they may come up with more questions. Instead of accusations what we need is collaborative efforts to understand the real origin of this virus, which will help us to avoid such future incidents. Need to trace out earlier symptomatic cases, analyze frozen samples if available, trace back the suppliers of animals to Wuhan markets, analyze samples from the suspected wildlife farms- the clock is ticking!

Collaborative Phase-II investigation

Generally, we learn and fantasize on pandemics from history lessons and sci-fi movies. When we first heard about the 2019 outbreak in China, nobody contemplated the extend of global spread and damages. However, years ago experts were alarming us about such possible scenarios, which fell on deaf ears. Even months after COVID cases started piling up in China, we thought that it may not cross the border. Warnings on transmission possibilities of this respiratory virus were neglected and many ridiculed that the authorities are overreacting. Governments failed on timely preparations and downplayed it as a normal flu that will disappear in a few weeks’ time. Some declared early victory over the pandemic at the time when it´s unimmunized population was hugely susceptible to circulating variants. Political reasons aside, deducing the origin these chaos is essential to prevent any such foreseeable outbreaks. The circumstantial evidence of presence of Corona virus research labs at Wuhan is enough for the conspiracy theorists to cook up a lab-origin story. No doubt, we should investigate any possibilities for figuring out the origin. Real scientific evidence is warranted to prove any origin hypothesis- one cannot declare a verdict just based on media guesses, intuitions from experts, social media posts, or circumstantial evidence. Origin hypotheses; namely natural spillovers and laboratory incidents should be seriously investigated until we have sufficient data. Science needs concrete evidence to prove a hypothesis- so far, we don’t have such evidence to prove either a natural or a lab-origin hypothesis.


