Data Management Plan Training

3me PhD students - Part 3

Bjørn Peare Bartholdy

Overview

Why?

Hopefully this is becoming clearer…

If it’s not clear after part 3, shame on me

Training Schedule

Part 1

  • Intro to research data management
    • policy requirements
    • processing personal data
  • Hands-on experience with DMP(online)

Part 2

  • Applying what you learned
  • Discussing with supervisor (and data steward) and completing the DMP

Part 3

  • Summary of part 1
  • Archiving and publishing data and code
  • Re-evaluation of DMPs

Plan

Lecture (right now)

  • Reminder of important bits
  • Archiving/publishing data and code

Interactive session

  • Presentation on workflow and DMP reflection
  • Q & A

During the project
(the important bits)

This illustration is created by Scriberia with The Turing Way community.
Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Data storage

Secure and backed-up (e.g. Project Data U:)

Or cloud solutions

  • SURFDrive: 500GB
  • OneDrive: 1TB

Cloud solutions are deleted after your contract ends

Send encrypted files via SURFfilesender

Data backup

3 copies of data (including active work)

2 media

1 off-site

Data organisation

Think about it. Can someone else take over without your help?

File naming should be consistent and understandable (to humans and machines)

  • include the date created (where applicable)
  • no special characters or spaces (excl. -, _)

Use version control.

Documentation

Best if done during the project!

README file(s)

Paper/electronic lab notebooks

Annotated scripts or Code notebooks (e.g. R Notebook and Jupyter Notebook)

Personal and confidential data

Working with human participants requires an HREC application:

After the Project

This illustration is created by Scriberia with The Turing Way community.
Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Archiving data/code

As a PhD student you are resposible for:

Ensuring that all data and code underlying completed PhD theses are appropriately documented and accessible for at least 10 years from the end of the research project, in accordance with the FAIR principles (Findable, Accessible, Interoperable and Reusable), unless there are valid reasons which make research data unsuitable for sharing.

- 3mE RDM Policy

Minimal requirement:

  • deposition of processed data underlying figures and conclusions in published papers and dissertations.

Encouraged:

  • deposition of raw data, software, data analysis scripts, protocols, etc.

Publishing data

Data are available upon request to corresponding author.

GIPHY

Publishing data

Considerations

As open as possible; as closed as necessary.

  • what to share
    • validate and reproduce results
    • confidentiality, intellectual property, ethics and privacy, patent
    • journal requirements for data availability
  • how to share
    • certified repositories
    • license

Publishing data

Not suitable for sharing:

  • identifiable personal data (name, email, BSN, biometrics, etc.)
  • confidential commercial data
  • third party data (not necessary to re-share if published)
  • other confidential data (nuclear research, military research)
  • large datasets (think terabytes)

If in doubt, contact supervisor, owner of data, faculty contract manager, privacy officer, data steward

Publishing data

Personal data

Data relating to identifiable person

  • Can publish:
    • anonymised or aggregated data
  • Can archive (restricted access):
    • pseudonymised data

. . .

Consent from participants needed for publishing and archiving

. . .

Can make empty informed consent form available

Data involved in patent

  • Can publish:
    • all data under temporary embargo (until IP is protected)
    • be careful with metadata

Publishing data

Be FAIR

Findable - persistent identifier (e.g. DOI) and detailed metadata
Accessible - long-term accessibility of data (or just metadata if restricted)
Interoperable - non-proprietary file formats
Reusable - proper documentation and clear license

Image: https://book.fosteropenscience.eu/

Publishing data

and CARE

Collective benefit - inclusive development and equitable outcomes
Authority to control - Rights, interests, and governance
Responsibility - respect, reciprocity, and trust
Ethics - minimising harm and maximising benefit

https://www.gida-global.org/care

Publishing data

Licenses

JoKalliauer; foter, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

Publishing data

  • Zenodo
  • 4TU.ResearchData
  • OSF
  • DANS
  • IDR
  • NOMAD

. . .

Registry of Research data Repositories

NOT legitimate repositories

  • GitHub/Lab (can be connected to 4TU and Zenodo)
  • ResearchGate/Academia-not-edu
  • Personal website


. . .

(You can of course use these in addition to a certified repository)

Publishing data

4TU.ResearchData

  • 4TU.ResearchData consortium (includes TUDelft)
  • free storage up to 1TB/year (for TUDelft researchers)
  • international repository
  • publish/archive code (15+ years)
    • open access
    • embargo
    • restricted access
  • DOI + citation statement
  • track usage (downloads, views, citation)
  • figshare in a TUDelft wrapper (for now…)

Publishing software

TU Delft policy on research software

  • Can it be made open source?
    • If yes, TU Delft transfers copyright to you
    • If no, contact your data steward
  • Apply pre-approved open source license
  • Publish the software (e.g. GitHub/Lab + 4TU for DOI)
  • Register software with PURE
    • If published in 4TU.ResearchData, this is done automatically

Publishing software

License compatibility

Bazuine, Merlijn. (2021). TU Delft Guidelines on Research Software: Licensing, Registration and Commercialisation. Zenodo. https://doi.org/10.5281/zenodo.4629635

Publishing software

Commercial vs. open source

Can co-exist (e.g. RStudio, NextCloud, ownCloud, Linux distros, WordPress)

  • Software is free and open source, maintenance and support is paid
  • Free for individuals, commercial licenses
  • Free basic model, proprietary advanced usage

Publishing protocols

protocols.io

  • publish detailed protocols
    • experiments
    • computations
  • assign DOI for publication
  • update with new version

Reproducibility

Reproducible: Reproducing results using the same methods and data

Replicable: Reproducing results using the same methods but DIFFERENT data

Main causes of failure to reproduce research:

  • Selective reporting
  • Methods, code unavailable
  • data unavailable

https://doi.org/10.1038/533452a

Another barrier to reproducibility is the use of proprietary software and file formats (Not all institutions have a MatLab license - very few individuals)

This illustration is created by Scriberia with The Turing Way community.
Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Reproducibility

Going the extra mile for science! 🥼

By Marwick et al. 2017 (https://doi.org/10.31235/osf.io/72n8g)

Steps towards Open Science

NOT everything everywhere all at once - start small

  • transparency
    • thorough documentation (data collection, methodology, analysis steps)
    • analysis outputs (from SPSS, MatLab, etc.)
    • raw data
  • research compendium
    • documentation (README)
    • code
    • data (raw and processed)
  • executable article
    • documentation (README)
    • code
    • data
    • computational environement (Docker, Binder, etc.)

Removing Barriers to Reproducible Research in Archaeology