COVID-19: How to share, discover and reuse COVID-19 related data and code
In January, the Wellcome Trust published a joint statement in response to the global impact of the novel coronavirus (COVID-19) outbreak calling for the sharing of research findings and data relevant to the virus. The statement was signed by more than a hundred publishers, funding agencies and research organisations and included a commitment to helping researchers ‘share interim and final research data relating to the outbreak, together with protocols and standards used to collect data, as rapidly and widely as possible’.
The purpose of this guidance is to help researchers share data related to COVID-19 in a timely and responsible manner without compromising research integrity and data quality.
The same principles of best practice for data sharing outlined below can also be applied to research software and code. For specific guidance on making software publicly available see our web page Making research software open and shareable.
Guidelines on sharing and reusing research data related to Coronavirus COVID-19
1. Manage ethical and legal obligations
Although making data and software open and shareable can make a positive contribution to attempts to control the COVID-19 outbreak, it is important that data sharing is done responsibly and in accordance with ethical and legal obligations.
Ethical approval
Healthcare authorities and agencies such as the Health Research Authority (HRA) and Medicines and Healthcare products Regulatory Agency (MHRA) have taken steps to make it easier for researchers to proceed with COVID-19 related research projects while still meeting ethical requirements. For example:
- The HRA have made available an expedited ethical review process for studies relating to COVID-19
- The Confidentiality Advisory Group (CAG) are likewise providing an expedited review for studies requiring access to patient information without consent.
Visit the NHS-HRA web page COVID-19: Guidance for sponsors, sites and researchers for more details
- The MHRA are prioritising clinical trials applications for trials relating to COVID-19 and have published guidance on Managing clinical trials during Coronavirus (COVID-19).
Researchers requiring assistance with ethical approval applications should contact the College’s Research Ethics team. They have also published a COVID-19 update on their web site.
The Inform System is a College hosted secure data management platform which supports the collection, analysis and management of clinical trials data.
The Big Data and Analytical Unit (BDAU) provide data management services including data storage and support for data analysis and visualisation for research groups working with de-identified healthcare data.
Data protection
Researchers collecting or accessing patient identifiable data during their research project should ensure that data confidentiality is protected in compliance with the GDPR and UK Data Protection Act.
- Where possible, data containing identifiable information should be anonymised to protect participants and enable data sharing.
- Anonymisation involves the removal of both direct and indirect identifiers. Pseudonymised data – e.g. data which contain a patient ID for which there is key or where there exists other information that could lead to re-identification – still counts as personal data under the GDPR and should be managed accordingly.
- Ensure that consent includes permission for data sharing, including for anonymised data.
Useful guidance on data anonymisation techniques can be found on the UK Data Service website.
Researchers requiring advice on GDPR compliance and anonymisation should contact their faculty Data Protection Coordinator. Support is also available from the Faculty of Medicine Information Governance team. Contact details and advice can be found on their web pages and SharePoint site (login required).
While it is essential to protect data confidentiality, even sensitive data can be shared if appropriate safeguards are in place and sharing does not conflict with ethical approval or consent agreements or is prohibited by contractual agreements. Consider using a data sharing agreement to determine who can access the data and under what conditions. A data sharing agreement template is available for download from the College’s web page on sharing personal data.
2. Deposit with a data repository
The Wellcome Trust statement on COVID-19 recommends that data supporting published findings should be shared as quickly as possible. This includes data supporting pre-prints as well as peer reviewed publications.
The easiest way to share your data is to deposit with a trusted data repository. Depositing with a data repository will ensure the long-term preservation and accessibility of your data. In addition, your data will be assigned a persistent identifier such as a DOI or accession number making it easier for others to cite the data and track its impact.
- If possible, deposit your data with a domain or subject specific repository. Examples of repositories accepting datasets relating to COVID-19 include the European Nucleotide Archive (ENA), EMBL_EBI, Gen Bank, and GISAID. Additional links to subject repositories accepting COVID-19 related data can be found on the ‘Submit data’ page of the European COVID-19 data portal.
- re3data is a registry of data repositories which allows you to search by subject area.
- If no domain specific repository exists, we recommend using a general-purpose repository such as Zenodo, Figshare or Dryad. Zenodo free to use and hosts a dedicated community site for COVID-19 related research materials. Depositors are encouraged to make COVID-19 related datasets and other research material open access.
- Zenodo also has integration with GitHub which enables easy archiving of key versions of software.
-
Generalist repositories such as Zenodo, Fighsare and Dryad are unable to accept datasets containing personally identifiable information.
Tell us about your data (and software)
Tell the College where your data/software are archived by creating a record for your data or software in Symplectic or emailing the DOI or repository ULR to rdm-enquiries@imperial.ac.uk
3. Encourage others to discover, access and reuse your data
Licence your data
Publicly accessible data and software should be released under a licence that allows the data to be accessed and reused with as few restrictions as possible.
- We recommend using a Creative Commons CC0 (public waiver) or C-BY (attribution only) for research data.
- Help with choosing a licence is available from our web pages Licensing your data.
- Creative Commons licences are not suitable for data which contain personal data or commercially sensitive data, or for data which contain third part copyright material where permission for sharing has not been granted.
Document your data
- Publicly shared data research data should be accompanied by sufficient documentation to provide the contextual information necessary for others to be able to understand and reuse the data.
- Examples of data documentation include laboratory notebooks, data dictionaries, code books, and blogs. As a minimum you should include a README file. See our web page Data documentation and metadata for additional information.
Link your data to your publications
- Some funding bodies and an increasing number of journal publishers expect or require researchers to include a data access statement in published papers. The Wellcome Trust statement on COVID-19 research mentioned above also recommends that all pre-prints should include ‘a clear statement regarding the availability of underlying data’.
- A data access statement should include the DOI or repository ULR that links to the dataset and details of any conditions or restrictions governing access to the data.
- Examples of data access statements can be found on our web page How to write a data access statement.
- Include a preferred citation in your data access statement to encourage others to cite your data. See our web page How to cite data for an example of a data citation format.
4. Find and use other people’s data
An increasing number of repositories and aggregate services are providing access to COVID-19 related datasets to assist researchers working on the virus. We have listed some of these resources below and will add to the list as others become available. Additional links to COVID-19 datasets and other related research materials can also be found on the Open Data Watch’s web site Data in the time of COVID-19.
Where possible, repositories and data centres are making their COVID-19 data collections open access, but always check the terms and conditions governing access and reuse of the data as set by the data provider or accompanying user licence.
Data accessed for research purposes should be properly cited and referenced just like any other research output. See our web pages How to cite data and Making research software shareable and reusable.
COVID-19 datasets can be accessed directly from data repositories such as those listed above e.g.
- European Nucleotide Archive (ENA)
- EMBL_EBI
- Gen Bank
- GISAID
- Zenodo’s COVID-19 community space
- Figshare
- Dryad
Datasets submitted to EMBL-EBI and other biomedical data repositories can also be accessed via the European Covid-19 Data Portal. Other web sites providing links to COVID-19 data and related materials include:
- OpenAIRE COVID-19 Gateway
- FAIRsharing COVID-19 collection
- SAGE Ocean: Coronavirus - The big data response
Web sites offering real time access to COVID-19 data, often accompanied by data visualisation:
- WHO Situation Reports
- COVID Tracking Project
- GitHub Novel Coronavirus disease 2019 (COVID-19) time series
- John Hopkins University Centre for Systems Science and Engineering’s COVID-19 dashboard
COVID-19 related datasets can also be found by using search engines such as DataCite or Google Dataset Search. Dimensions are also providing access to details of, and links to, all COVID-19 related publications, datasets and clinical trials included in their database.
Additional resources
This document is aimed specifically at helping researchers share data related to the COVID-19 virus in a timely and responsible manner. For help with other aspects of research data management such as data management plans or data storage and security please visit our website or email rdm-enquiries@imperial.ac.uk.
The Scholarly Communications Management team can also help with the following:
Open Access - contact openaccess@imperial.ac.uk
Bibliometrics - contact bibliometrics@imperial.ac.uk
Copyright - contact Ask the library (login required)