Tuesday, September 25, 2012

Retraction of Scientific Papers to Correct the Scientific Literature

Wikimedia - Colin Smith

Most biomedical researchers encounter situations in their research careers during which they are unable to replicate the findings of a published scientific paper. This prompts a nerve-wracking search for the underlying causes. Frequently, the reason is quite trivial, such as the use of slightly different reagents or cells from those in the published paper or minor discrepancies in the experimental protocol. Nevertheless, even when all the experimental procedures are followed appropriately, some published findings simply can't be replicated. As frustrating as this sounds, it is unfortunately not a rare occurrence. In a recent paper entitled "Drug development: Raise standards for preclinical cancer research" published in Nature, the scientists Glenn Begley and Lee Ellis describe the attempts of a biotechnology company to replicate the results of "land-mark" scientific papers in the field of cancer biology. They were only able to replicate the scientific findings in 11% of the cases! This is a shockingly poor rate of replicability for published studies, many of which appear to have been published in very prestigious biomedical journals. Begley and Ellis appropriately call for higher standards in biomedical research to improve replicability, both by the researchers themselves who conduct the experiments as well as for the peer-review process that currently allows the publication of so many papers that cannot be replicated.
            Many of us have had similarly frustrating encounters with published papers that cannot be replicated, but we also realize that there are no "quick fixes" to solve the problem. The current peer-review process is based on the opinion of one or more editors who depend on the assessments of multiple scientific experts. Neither the editors nor the scientific experts have any way of testing the replicability of the results prior to deciding whether or not a study should be published. they simply have to trust the authors of the manuscript and believe that they took all the necessary steps to ensure the replicability of the results. If it turns out that the central findings of a published paper cannot be replicated, this information is frequently not officially published or acknowledged, but instead, it is shared unofficially among scientists who have had problems replicating the findings of that paper. In many cases, there is a presumption that the authors of the published paper may have made errors in how they conducted the experiments or interpreted the data, or that they perhaps forgot to disclose some key details that are necessary to conduct the experiments and obtain the same results. Even if multiple colleagues feel that the experiments, data or conclusions in a paper are flawed, published papers are rarely retracted by a journal. Retractions of papers are usually reserved for gross misconduct by the authors of a paper, such as overt fabrication of data. Ivan Oransky and Adam Marcus are two science journalists, who founded the website Retraction Watch, which tracks papers that are retracted by scientific journals and nearly all the posted retractions are usually a consequence of such gross misconduct and violation of ethics that has come to light. Retractions because of honest scientific errors are rare.
            This may change. In a recent blog post, Virginia Barbour (editor of PLOS Medicine) and Kasturi Haldar (editor of PLOS Pathogens) propose that scientific papers can and should be retracted by a journal, if there is ample evidence that its major conclusions are wrong, even if there was no overt misconduct and the erroneous conclusions were the result of an "honest error". They reference a 2006 paper that was recently retracted by PLOS Pathogens because its claim that a new gammaretrovirus XMRV was associated with prostate cancer did not hold up; the virus may have been a laboratory contaminant. The editors state:
 At PLOS our mission is to accelerate progress in science and medicine by leading a transformation in research communication. We firmly believe that acceleration also requires being open about correcting the literature as needed so that research can be built on a solid foundation. Hence as editors and as a publisher we encourage the publication of studies that replicate or refute work we have previously published. We work with authors (through communication with the corresponding author) to publish corrections if we find parts of articles to be inaccurate. If a paper’s major conclusions are shown to be wrong we will retract the paper.

            This is quite a major decision, because it suggests that papers without gross misconduct (such as fabricated data or plagiarism) can be retracted. The paper that PLOS Pathogens retracted did not disappear from the website, but can still be read in full (as of today) and carries a big "Retraction warning". I have to admit that in many ways I am glad the editors recognize the importance of formally flagging scientific errors. This will prevent future researchers from wasting their time and resources trying to replicate the results of flawed papers. However, I am concerned about the idea of "retracting" papers because of scientific errors. Even though the PLOS editors say "there is no shame in correcting the literature", the idea of retraction already carries the connotation of shame because in the past years it has been associated with fraud and overt misconduct and not with honest scientific errors. It may therefore be better to use a distinct terminology. Just like there is a huge difference between a murder and an accidental killing, there is also big difference between intentional fabrication of data and the unintentional oversight of a viral contaminant. The editors' goal of correcting the literature and highlighting errors in published papers is laudable, but perhaps one could introduce a more neutral terminology that is not burdened with the connotation of fraud. One such example would be "flagging" papers and indicating why they are being flagged. "Category 1 flagging" could be reserved for intentional fraud and misconduct (i.e. the traditional retraction), "Category 2 flagging" could indicate an honest scientific error when the authors agree that the conclusions of their published paper were erroneous and "Category 3 flagging" would indicate that the overwhelming majority of scientists have failed the replicate the results, but the authors maintain that there was no error and that their conclusions and results are solid.  Such post-publication "flagging" would depend on formal post-publication peer review and tracking of replicability, as Ivan Oransky and Adam Marcus have previously suggested.
            The discussion of how to address scientific errors after publication is very important and it comes at a time when the internet provides the tools necessary for us to easily document and share our attempts to replicate published scientific data. No matter whether the outcome is to formally retract flawed papers or whether new categories of flagging flawed and questionable papers are developed, we can be optimistic that scientific research will likely benefit from this discussion and the realization that published scientific work still needs to undergo scrutiny.


  1. One of the misconceptions in response to the PLOS blog is that their retraction entails complete removal of the article. If this were the case, it would indeed be a problem, because then one could not learn from the mistakes made in the retraced/flagged article. I do not think that this is what the PLOS editors are suggesting. For example, the retracted PLOS Pathogens article is still available on the website and is prefaced with a warning that the virus was laboratory derived and the conclusions are thus wrong. I think this type of notification/flagging needs to be expanded and also document the history of how the error was found, which studies attempted replication, etc so that we can all learn even more from the process.

  2. I've thought about this for a long time and I'm glad to say we're finally doing something about it. Mendeley has partnered with Science Exchange, Figshare, and PLOS to launch the Reproducibility Initiative, which invites researchers to submit their best work for replication by Science Exchange's network of third-party experts. This provides a stamp of validation and approval for the work, which is a positive reinforcement for good work, as opposed to flagging the bad stuff like Retraction Watch and others do. We think this positive reinforcement approach is promising because it rewards people for doing the right thing, creates a trusted collection of the best stuff, and obviates the need for scanning all the literature (an impossible task) in an attempt to find and flag the bad stuff.

  3. This just goes to show that a lot of scientific research is fraudulent.

    1. We do not know how much of research is actually fraudulent. The retractions represent a tiny fraction of published data, i.e. roughly 0.01% of all published papers. Even if the true "fraud" percentage were ten times higher, we would still only be talking about 0.1% of published scientific papers.

  4. Speaking as an ecologist (and I suspect medicine has the same issue), there are often other reasons why results cannot be replicated: the mind-boggling complexity of the systems that are being studied. Isolating a few key variables with a small sample size often leads to results that cannot be replicated, even in apparently similar settings and with identical methods.