Principal findings
Our cross sectional analysis of all papers retracted for originating from paper mills until June 2022, identified from the Retraction Watch database, suggests that these paper mill retractions are increasing in frequency. Nearly all authors of these papers came from China and were predominantly affiliated with hospitals. The median time for retraction of a paper mill paper was close to two years and increased with the ranking of the journal in which it was published, so that the higher the Journal Citation Reports impact factor, the shorter the period until retraction. These papers affect legitimate journals and does not seem to be exclusive to predatory journals. Furthermore, this study showed the impact and visibility of these retracted papers because some were highly cited, with the potential consequences that this entails. To our knowledge, this is the first study to analyse the growing phenomenon of paper mill retractions and their characteristics.
Our findings suggest that the publication of paper mill papers increased between 2017 and 2019, when about 5 to 10 were published and eventually retracted for this reason per 100 000 publications. In 2020, the number of identified paper mill papers published in the scientific literature fell sharply. This decrease may have occurred for a number of reasons. Firstly, papers published between 2020 and 2022 that might eventually be identified for retraction have not yet been identified or retracted. Retraction of a paper takes a long time, and more retractions will possibly appear in the future. Secondly, as a result of investigations initiated in early 2020 by a number of editors and researchers,12 the scientific community have become aware of the problem, and guidelines have been published to help editors identify such papers.4 Even though these guidelines do not enable a paper mill paper to be unequivocally recognised, they do make screening and identification of papers originating fom paper mills possible. Hence, numbers might be smaller than would have been because scientific journals have improved methods for their identication during editorial review and peer review, thereby preventing their publication. Thirdly, the increased attention to this type of fraud might also have deterred authors from engaging the services of paper mills, because of the consequences of scientific fraud, especially in some countries such as China.13 Then again, an increased exposure could have caused paper mill organisations to change their mode of operation, thus hindering detection.14
Although this issue is relatively new, particularly in America and Europe, for some years now the use of these types of organisations has been widespread in other countries, such as China.1015 China encouraged its researchers to publish papers in return for money and career promotions.16 Furthermore, medical students at Chinese universities are required to produce a scientific paper in order to graduate.15 In fact, these organisations openly advertise their services on the Internet and maintain a presence on university campuses, not only in China but also in other countries, such as Russia.815
Perhaps unsurprisingly, most papers retracted for being paper mill papers come from that same country. These results are in line with the findings of other researchers and editors of scientific journals, although paper mill papers have been reported in other countries, such as Iran or Russia.81217 The activity of the largest paper mill organisation in Russia named International Publisher has recently been acknowledged.810 Although this paper mill has published approximately 1000 papers, its own website announces that more than 5000 authors have bought the coauthorship of at least one paper.8
Also, we note that most authors of identified paper mill papers were hospital affiliated, which is consistent with previous research.15 The main reason for this might be that Chinese doctors are not affiliated with medical schools, but with hospitals. Of note, pressure to publish is greater in biomedical sciences than other specialties and publications are usually needed to get a university degree or a promotion in China.15
Most paper mill papers were published in pharmacy and clinical medicine journals, but many of them were published in basic science journals as well, such as cellular and molecular biology or biochemistry. Therefore, this problem not only affects clinical medicine areas. This research has not focused in analyzing specifically if paper mill papers are published more frequently on clinical medicine topics or basic research. We are of the opinion that this aspect should be further analysed. According to our results, no major variations over time have been observed in the topics covered by the paper mill papers so far. However, the latest COPE report indicates that this pattern could change, for example in topic areas or types of journals, over time.18
The main problem which paper mill papers pose for editors and reviewers of scientific journals is the difficulty of identifying them through the peer review process because the papers appear to be legitimate. Analysis of images in a manuscript has been identified as one of the possible strategies for detecting paper mill papers because most images tend to be manipulated or duplicated, or both.14 Although different softwares are capable of detecting image manipulation, paper mill papers often use duplicated images (or stock images)519 because they are more difficult to detect than manipulated ones. At present, no software is capable of detecting image duplication in a reliable way, thus leaving this task to editors and reviewers. That said, however, not all papers contain images that allow for scrutiny. Another strategy for screening questionable papers is the Problematic Paper Screener software. This software identifies so-called “tortured phrases,”—that is, unusual phrases instead of established ones, which might be an indicator of suspected scientific misconduct.20 Also, COPE has published a list of common indicators for paper mill papers that could serve as a screening tool for suspicious articles.18
With the aim of preventing and detecting scientific misconduct, some countries already have offices and specific bodies that address aspects relating to scientific integrity, but many others do not have structures of this type.21 Countries that have no body or policies governing scientific misconduct incur a higher risk of producing fraudulent papers.22 Countries such as Denmark, Sweden, and China, have passed laws against scientific fraud. Ironically, China has the most severe penalties for research fraud. The paucity of consequences that scientific misconduct has historically had in this country might have played an important role in the increase in unethical behaviour, including the use of paper mills.15 In 2018, after a number of scandals in China, the law against scientific fraud was strengthened by imposing sanctions that go beyond the purely academic and occupational sphere.23 This tougher approach appears to have started yielding results and, in December 2021, more than 300 researchers were reportedly penalised for scientific misconduct. Among other things, the penalties included revocation of academic degrees and cancellation of promotions.24 Because practically all paper mill papers come from China, these recent penalties policy might have contributed to the reduction in the number of paper mills since 2020.
Strengths and limitations
This study had limitations. Retractions of paper mill papers continue over time. Because of this, our investigation will need to be updated over time as the conclusions could well vary as the list of retractions grows. The characteristics of retracted and non-retracted paper mill papers can differ, which could explain why some papers were identified but not others, although all represent fraudulent science. Another limitation was the difficulty in assigning the cause of retraction in some cases, hence misclassification is a risk. In this study, we have included formally retracted paper mill papers, not taking into account suspicious papers (ie, those from the list elaborated by EB and others) and this might be a limitation of the present research. However, the inclusion of papers not formally retracted might incur in a risk of misclassification of those papers if they are not finally retracted as paper mill products. A limitation regarding the citation analysis is that citations before and after retraction have not been differentiated in this study and this issue should be considered in future research.
The main strength of this study was the use of the Retraction Watch database to identify retracted paper mill papers because this source is the main database on retractions and should currently be considered as the gold standard for aggregated information on retracted articles. The Retraction Watch database has three times the coverage of PubMed and five times the coverage of CrossRef (Retraction Watch, personal communication, 2022). Taking this into account, we consider that the number of missing retractions should be minimal.