Curated COVID-19 SourcesWorks are tagged with the source of their inclusion in this COVID-19 corpus:
- Allen Institute for AI CORD-19 corpus
- WHO Database of publications on coronavirus disease (COVID-19)
- Wanfang corpus of Chinese COVID-19 papers
- CNKI corpus of Chinese COVID-19 papers
- Fatcat (based on keyword queries against the full catalog)
The fatcat catalog is intended to be a "universal" preservation and access archive, not a narrow currated collection of only the highest quality research content. This means that not all content has undergone peer-review, and some may have been uploaded to services like academic social networks (eg, researchgate) or institutional repositories with absolutely no human editorial review or filtering.
The catalog intends to capture metadata such as publication stage (draft, published, retracted), venue, and medium (journal article, web post, encyclopedia entry, frontmatter) to help filter through this content. But in some cases this metadata is incomplete or may be inaccurate. For example, pre-print PDF files may be incorrectly associated with the final published version of a work, or vica versa.
Sources of MetadataThe source of all bibliographic information is recorded in edit history metadata, which allows the provenance of all records to be reconstructed. A few major sources are worth highlighting here:
- Release metadata from Crossref, via their public REST API
- Release metadata and linked full-text content from NIH Pubmed and arXiv.org
- Release metadata and linked public domain full-text content the JSTOR Early Journal Content collection
- Creator names and de-duplication from ORCID, via their annual public data releases
- Journal title metadata from DOAJ, ISSN ROAD, and SHERPA/RoMEO
- Full-text URL lists from CORE, Unpaywall, Semantic Scholar, CiteseerX, and Microsoft Academic Graph.
- The Fatcat Guide lists more major sources