Evidence for a natural origin of Covid-19 no longer dispositive after scientific peer review
By Alina Chan, on social media hiatus, tweets on #OriginOfCovid can be found @ayjchan
Declaration of competing interests: The author of this article has co-authored a book, VIRAL: The Search for the Origin of Covid-19, with science writer Matt Ridley. The updated paperback of VIRAL was published in June 2022 by HarperCollins UK and USA. The book explores both natural and lab Covid-19 origin hypotheses.
In February 2022, a group of scientists posted a pair of preprints online that were widely reported by the popular media as proof of a natural origin of the pandemic. The New York Times reported, “Two new studies say the virus was present in animals at the Huanan seafood market in 2019” and that these studies constituted “a significant salvo in the debate over the beginnings of a pandemic that has killed nearly six million people across the world.”
The NYT journalists who covered these preprints were so eager to report the then not-yet-peer-reviewed findings that the first online version of the story was only a dozen sentences long and only quoted the lead author of the preprints. The story was featured as front page breaking news on the NYT website just as the Ukraine war was unfolding.
Many of the news reports covering the February preprints failed to mention that several of the preprint’s senior authors had been among the first to rule out a lab origin of the pandemic in early 2020, and that these scientists had been privately convened in early 2020 by top scientific leaders in the US and UK — some of whom had hoped that their efforts would have shut down speculation around an accidental lab origin, which they considered a “very destructive conspiracy”:
At the time, several scientists and analysts, including myself, quickly identified scientific flaws that significantly undermined the February preprints’ claims:
Five months later, both preprints have passed scientific peer review at Science…
How did the process of peer review change their claims?
And, did the scientists address the major problems in their analysis?
(1) After peer review, unscientific language was removed from the Worobey et al. manuscript. However, these strong claims in the preprint had already been widely reported in the media back in February 2022. Needless to say, those journalists likely will not be making corrections or publishing new stories to clarify that these excessive claims have now been eliminated by scientific peer review.
For example, both of the following strong assertions that were in the preprint have been revised; there are no longer claims of dispositive or incontrovertible evidence in the peer-reviewed paper:
“Together, these analyses provide dispositive evidence for the emergence of SARS-CoV-2 via the live wildlife trade and identify the Huanan market as the unambiguous epicenter of the COVID-19 pandemic.”
“Collectively, these results provide incontrovertible evidence that there was a clear conduit, via susceptible live mammals, for the zoonotic emergence of SARS-CoV-2 at the Huanan market towards the end of 2019.”
(2) The peer-reviewed paper has an entirely new section on “Study Limitations” which acknowledges that the scientists do not have access to the early Covid-19 case data or locations, lack direct evidence of a market animal infected with the pandemic virus, and lack complete details of how the market had been sampled for the virus.
Despite lacking access to data, Worobey et al. 2022 surprisingly claim in their preprint and the peer-reviewed paper that “positive environmental samples [were] linked both to live mammal sales and to human cases at the Huanan market.”
This stands in contrast to what the Chinese CDC claims in their own February 2022 preprint — that there was no link between positive environmental samples and the type of product sold by vendors. “The market might have acted as an amplifier due to the high number of visitors every day, causing many initially identified infection clusters in the early stage of the outbreak.”
Who is correct? Only the data can say. The Chinese CDC manuscript is still undergoing peer review. I hope that the journal and peer reviewers ask them to publish their complete dataset so that this question can be put to rest.
FYI the Worobey et al. Science article indicates that the supporting data for their analysis can be found in their preprint: “Data and materials availability: Data and code for this manuscript are available from (53).” Their preprint reveals that the data underlying their analysis came from two key sources: a media leak in June 2020 by the Epoch Times, and the China-WHO joint report in early 2021 which consisted of only information that had been provided by scientists in China.
(3) The paper still fails to acknowledge that early Covid-19 cases, regardless of whether the patient had a known link to the market, had been identified with ascertainment bias — reinforcing the initial perception that the Huanan seafood market was the early epicenter of the Wuhan outbreak.
Multiple reports from the early days of the pandemic and even the China-WHO joint study tell us that, due to the initial suspicion that the virus had spilled over from illegal wildlife sold at the market, the retrospective identification of December 2019 coronavirus cases included in their case definition a link to the Huanan market and early surveillance focused on hospitals near the market and the neighborhood of the market; this market bias was only removed from the criteria for identifying potential Covid-19 cases on January 18, 2020.
This cannot be stressed enough: Early cases with no links to the market had been identified almost solely through searching the hospitals near the market and the neighborhood of the market. This is why even the unlinked cases look like they cluster around the market.
Later retrospective searches that removed the requirement for a link or proximity to the market (or its neighborhood and nearby hospitals) for early Covid-19 cases in 2019 in Wuhan were reportedly unproductive (see the China-WHO joint study report). This means that, due to the ascertainment bias, early clusters and cases without links to the market, hospitals near the market, or the neighborhood of the market would have easily been missed.
The only citation that Worobey et al. 2022 offer to substantiate their claim that early cases had not been identified with ascertainment bias is Worobey’s own 2021 single author correspondence (also published in Science): “These early reports were free from ascertainment bias as they were based on signs and symptoms before the Huanan market was identified as a shared risk factor (5).”
In other words, Worobey et al. are claiming that there is no ascertainment bias because their lead author said so. It is unclear where this confidence comes from as the authors not only lack access to early Covid-19 case data, but also the methods by which the data had been collected.
On a related note, the peer review process seems to have failed to call out the fact that Worobey et al. base their claim about the market origin of an early lineage of the pandemic virus on the location of only two cases and one environmental sample from the market collected in early 2020: “That both identified lineage A cases had a geographical connection to the market, in combination with the detection of lineage A within the market (24), support the likelihood that during the early epidemic lineage A was, like lineage B, disseminating outward from the Huanan market into the surrounding neighborhoods.”
At the very least, after peer review, the Worobey et al. 2022 paper now says: “However, the observation that the preponderance of early cases were linked to the Huanan market does not establish that the pandemic originated there.”
Bonus criticism: One of my pet peeves is scientists suggesting that Wuhan is a likely location for the natural spillover of SARS-like viruses from bats or other animals into people. The Worobey et al. 2022 paper suggests this by saying: “This region of Hubei contains extensive cave complexes housing Rhinolophus bats, which carry SARSr-CoVs (49).” Readers who make the effort to read the literature on this topic will realize that the few SARS-like viruses found in Hubei province are distantly related to the pandemic virus, and most likely do not utilize the human ACE2 receptor to infect cells — making these viruses deeply unlikely to spillover from bats to cause an outbreak in people. Not all SARS-like viruses can cause human outbreaks, otherwise the hundreds of SARS-like viruses discovered by Wuhan scientists and brought back to Wuhan labs would constitute an immense biosafety and biosecurity risk…
This story represents a prime example of how sampling/ascertainment bias can mislead scientists into believing that there is a true correlation.
Because Wuhan investigators searched only for early Covid-19 cases with links to the market or its nearby hospitals and neighborhood, most (if not all) of the identified December 2019 cases ended up matching these criteria. At the time, after only four cases had been reported with links to the Huanan seafood market, local investigators assumed that they were experiencing a repeat of the 2003 SARS outbreak — that there was a wildlife trade/market origin of the virus. Therefore, they logically went to the market and took the bulk of their samples from the stores that sold wildlife or stores where known patients had worked. They also performed retroactive and proactive searches for potential patients in hospitals near the market and in the neighborhood of the market. Medical personnel at hospitals and clinics across the city were informed to only report suspect cases if they were linked to the market (also see this report by the Chinese CDC and the China-WHO joint report page 42 and its annexes pages 125 and 161). These details have been publicly confirmed (see above), and we write about this in detail in VIRAL (Chapter 3 and the Epilogue, citations are available in the book).
By the time the search for cases was expanded beyond the market and its vicinity or nearby hospitals (after January 18, 2020), a large majority of the early cases considered in the China-WHO joint study and the Worobey et al. 2022 analysis had already been identified.
Two years later, Worobey et al. 2022 looked at the limited available early case information, unaware that these had been collected with ascertainment bias, and claimed it as evidence that the market was the origin of the Wuhan outbreak…
In particular, this statement in their Science paper directly conflicts with the source of their early case information (the China-WHO joint report) and reveals that the authors don’t understand how early cases had been identified: “It is also noteworthy that the December 2019 COVID-19 cases we consider here were identified based on reviews of clinical signs and symptoms, not epidemiological factors such as where they resided or links to the Huanan market (7).”
Because of this lack of awareness, Worobey et al. don’t understand why market-linked cases were detected with home addresses both near and far from the market. If they had been aware that suspected early cases had been reported by hospitals across Wuhan based on criteria that included a link to the market, this observation would not have surprised them at all. In other words, regardless of where an early patient lived or which hospital they were admitted to, if they had a known connection to the market and displayed Covid-like symptoms, they would have been tested and reported to public health officials.
Worobey et al. were also puzzled that hospitalized/contact-traced patients with no links to the market were found in the hospitals and neighborhoods around the market. Again, they only need to have more carefully read the China-WHO report to understand that this was a result of ascertainment bias. The report explicitly said that early cases were identified by surveilling hospitals and neighborhoods proximal to the market — these suspected cases did not have to be linked to the market itself in order to be tested and reported to public health officials.
Worobey et al. finally point out that seroprevalence in Wuhan was highest in the “districts around the market”. Readers who make the effort to delve into the cited seroprevalence (see here and here; blood samples were only collected in April 2020 after the city-wide outbreak) and excess death (see pages 39 and 40 of the China-WHO joint report) studies will immediately see that these “districts around the market” really refer to the entire city center of Wuhan. I am not sure why it is surprising to Worobey et al. that the city center, which sports the highest general and elderly population density in Wuhan as well as the main hospitals to which all Covid-19 patients were sent, also had the greatest excess death and seroprevalence. Because human-to-human transmission was not acknowledged at the time, Covid-19 was allowed to spread and infect the HCWs and local community in the city center. In other words, there is not enough geographical resolution for this data to be interpreted as pointing to the market or any location in the city as an epicenter of the Wuhan outbreak. For more discussion on the geographic distribution of Wuhan Covid-19 cases, please see my other medium post, A response to “The Origins of SARS-CoV-2: A Critical Review”.
In summary, all of the observations that Worobey et al. cite as signs that the market was the outbreak epicenter can be easily explained by the ascertainment bias with which early cases were identified.
Update March 4, 2023: It has been pointed out that Chinese investigators likely found other early cases in 2019 without requiring a link or proximity to the market, but that this data, importantly, has not been shared in full with the WHO and was therefore not used by Worobey et al. in their analysis. It is also notable that on January 3, 2020, the Chinese authorities instructed the destruction of patient samples, which effectively made it much more difficult, if not impossible, to track down the earliest cases. In this situation, until we get a clear break down of the early cases (symptom onset, when diagnosed, when added to the total count) and actually see the data (not a synopsis and a highly pixelated figure), I am not confident that what has been shared with the world is comprehensive or representative. It is unscientific for Worobey et al. to assert that the data they used in their analysis is representative of the 2019 novel coronavirus cases in Wuhan.
A note on the one cage environmental sample from the Wuhan market that tested positive for virus genetic material:
According to the Chinese CDC study preprinted in February this year, “a total of 923 environmental samples from different locations within and around this market and 457 animal samples including animal bodies, stray animals and their feces were collected, with some stray animals sampled until March 30th.”
None of the animal samples tested positive for the virus.
Out of the 923 environmental samples, 73 tested positive.
The complete sampling scheme and dataset were not released in the preprint (but hopefully will be shared in the final peer-reviewed publication). The Chinese CDC said that their data did not point to any significant association of the known market Covid-19 cases or environmental samples with the sale of any particular type of product (e.g., wildlife products). They interpreted the data to mean that the virus “SARS-CoV-2 might have been circulating in the market, especially the western zone, for a period of time in December 2019, leading to an extensive distribution of the virus within the market, which might have been facilitated by the crowded buyers and the contaminated environment.”
A great deal of attention has been drawn to the fact that out of the 73 positive environmental samples, 5 of these were from a store that sold live wild animals and that 1 of the 5 samples was from a cage stored in an inner room. One NPR reporter even described this as “physical evidence” of a market origin of the virus.
However, this fails to consider 2 points:
(1) The Chinese CDC sampling strategy specifically targeted stores that sold wildlife because they suspected an animal origin of the virus, similar to the 2003 SARS outbreak in Southern China. Therefore, it should not be surprising that a good portion of the samples and the positive samples are from stores selling wildlife products. Again, they (and the China-WHO report) reported that they did not observe any significant association of positive environmental samples with the sale of wildlife. To verify whether this is true, we need the Chinese CDC to release their full sampling scheme.
(2) There is no reason why cages or other equipment for processing live animals would magically repel contamination or aerosolized virus from the many infected people at the market. The human outbreak of the virus at the market was well underway by the time the Chinese CDC arrived on the scene to collect samples on 1 Jan 2020. Several of the vendors lived at the market in inner rooms close to their stalls. To show how widespread the virus was in the market by this point, the Chinese CDC reported that “Of the 110 samples collected from sewers or sewerage wells in the market, 24 samples were positive for SARS-CoV-2 nucleic acid. All the four sewerage wells in the market tested positive.” A reminder here that the market is the size of about 10 USA football fields. By 1 Jan 2020, we also know that the outbreak had already reached other parts of the city far away from the Huanan Seafood market. Therefore, in my opinion, the excessive attention given to a single positive sample from a cage in an inner room, as compared to all the positive samples on floors, walls, sewage etc. spread across the market seems a bit desperate. Particularly in the absence of any known infected animal at the market or its supply chain.
A note on the “2 strains 2 spillover” hypothesis proposed by Worobey and Pekar et al. 2022 Science:
For most people, this is probably the first time you’re hearing about 2 strains of the pandemic virus that spilled over from animal to human in the Wuhan market.
Why haven’t you heard about this till now? Wasn’t it just one virus that emerged in Wuhan in late 2019?
This is because it was just the 1 virus (1 strain) that emerged in Wuhan in late 2019.
The so-called “2 strains” — one is called lineage A and the other is called lineage B — that you’re hearing about in the news today are the same virus but with only 2 mutations differentiating A and B. Consider that the virus genome is close to 30,000 letters long. It’s a bit of a stretch to say that A and B are two different strains. For context, the Alpha variant has 44 mutations compared to the original Wuhan virus; the Delta has 50 mutations compared to the Wuhan version; and the Omicron has 113 mutations compared to the Wuhan virus. Within a single infected individual, there are billions of virus particles, many of which carry different sets of mutations. If we start to separate “strains” of the virus by only 1–2 mutations, this would mean that every Covid-19 patient is carrying possibly thousands of different strains of the virus.
With only 2 mutations separating the “2 strains”, there is no need for more than one introduction of the virus into the human population to have occurred. Each time a person infects another person with the virus, there is already an estimated 5–10% chance the virus will pick up 2 mutations. For example, in the Diamond Princess cruise outbreak in early 2020, a single introduction of the virus into the cruise passenger population resulted in some of the virus picking up 5 mutations within ~3 weeks of the first infected person developing symptoms.
So it is much more likely that a single version of the virus was introduced into the human population in Wuhan, transmitted undetected, and quickly picked up 2 mutations — as opposed to the “2 strains 2 spillover” hypothesis proposed by Worobey and Pekar et al. 2022 Science, where there has to be 2 separate transmissions of the same virus from as yet still missing infected animals into people at the Wuhan market. Considering that no direct evidence exists for even a single zoonotic spillover at one market, a single introduction of the virus into the Wuhan human population is most consistent with the data available thus far.
So why are Worobey and Pekar et al. pitching a “2 strains 2 spillover” hypothesis if there was really only the 1 strain and no direct evidence of even a single spillover?
This is because the early cases at the Wuhan market were all only infected by the B version of the virus. So, the question is where did the A version of the virus come from? Furthermore, the A version is widely considered to have preceded the B version — indeed, it was named lineage A because it was presumed to come before lineage B–and several early patients who did not visit the market were found to be infected with A. For instance, one family from Shenzhen visited Wuhan between 29 December and 4 January, and without any known exposure to the market picked up the A virus. (On a more technical note: It is very unlikely that A evolved from B because the virus is more likely to mutate away from the closest bat virus relative. There are so many possible mutations a virus can pick up to make it better at spreading in human and other animal species, why would it pick the 2 mutations that revert towards the bat version? See these two studies for more details.)
Under these circumstances, the scenario most strongly supported by the available genetic and epidemiological evidence is that the virus had started infecting people somewhere in the city, a patient carrying the (later) B version visited the market and sparked off the early superspreader event. Remember that the market, with a retail space of ~ 10 USA football fields, had an estimated ten thousand visitors a day, and was situated in one of the most densely populated and centrally located neighborhoods in Wuhan. It is also located close to several of the main hospitals in the city center of Wuhan. It is not a surprise that the poorly ventilated, crowded wet market would have been a prime location for an early Covid-19 superspreader event.
This obviously puts proponents of the Wuhan market origin in a tough spot. How can they explain that the available evidence doesn’t seem to point to the market being the original site of an animal-to-human or even the first human-to-human transmission of the virus?
This February, the Chinese CDC reported that a single environmental sample out of hundreds collected from the market was found to carry an A version of the virus. This was a sample taken from gloves on 1 Jan 2020, weeks after people in Wuhan — both linked and unlinked to the market — had already been infected and were transmitting the virus to other people. In addition, this glove virus sequence had already picked up other mutations, suggesting that it was likely from a person infected later in the outbreak who visited the market. However, Worobey and Pekar et al. assert that this non-definitive piece of evidence points to a second spillover of the virus from (still missing) animals to people at the market.
In order to argue for a market origin of the pandemic, Worobey and Pekar et al. had to (i) rule out the more likely scenario of a later infected case in the outbreak bringing the B variant to the market (sparking a superspreader event) and (ii) explain why none of the early market cases had been infected with the A variant.
To achieve this, Worobey and Pekar et al. had to make it so that the start of the outbreak was much later in 2019. After removing data that could challenge their conclusion (e.g., finding ways to ultimately remove ALL early virus sequences that point to intermediates of A and B circulating in people, and not applying the same exclusion criteria to other sequences in their analysis), their estimate is that an animal only passed the B version of the virus to a person around 18 November, and that another animal passed the A version of the virus to a person a week later around 25 November.
Part of their assumption is that (i) the early cases had not been identified with significant ascertainment bias (read first half of this medium post where this assumption is countered by statements from Wuhan and China CDC investigators) and (ii) there were not a substantial number of early cases that were missed (or not reported) by investigators — this is a strange assumption considering how transmissible and stealthy the virus is and the fact that the majority of infected people do not develop severe symptoms. (Here, I also recommend reading about the deletion of early virus sequences by Wuhan scientists which made it much more difficult for other scientists to know about these and include them in their analysis of how the pandemic started.)
Consider that the incubation time (time of infection to symptom onset) of the original Wuhan version of the virus was 5–14 days. Worobey and Pekar et al.’s assumptions would mean that within about a month of the first person infected by an animal developing Covid-19 symptoms, the Chinese authorities had already detected the outbreak, realized it was not the common cold or flu, and sequenced the genome of the virus (this was obtained by 27 December 2019); and that within this same month, the virus had already spread beyond Wuhan/Hubei province and likely the borders of China. Biotechnology and travel have advanced tremendously in the past decade, but 1 month after patient zero develops symptoms is a stunning turnaround time for the identification of a novel pathogen and for the virus to have spread internationally.
Ultimately, it should be clarified that there was only 1 strain of the virus that emerged in Wuhan in late 2019. Describing variants that only differ by 2 mutations as 2 different strains has (mis)led people into arguing about whether it is more or less likely for 2 spillovers at a market to occur as opposed to 2 leaks in a laboratory. As described above, the available genetic and epidemiological evidence makes Worobey and Pekar et al.’s “2 strains 2 spillover” hypothesis very improbable. However, the authors are forced to endorse this unlikely hypothesis — very vocally in the media and to people without a technical understanding of what they mean by “2 strains” — because they would otherwise have to explain why only a (later) variant of the virus was circulating among early cases at the market, and why they do not interpret this to be more consistent with a human superspreader event at the market.
For readers who would like a break down of the current state of evidence for a natural vs lab origin of Covid-19, please see my pinned medium post.