Research Data Management Training

ME PhD students - Part 1

Slides are available at https://doi.org/10.5281/zenodo.7298877

Overview

ME Data Stewards

Sara Shoghi

Bjørn Bartholdy

datasteward-ME@tudelft.nl

Why are you here?

Support with Research Data Management Policy requirements:

Attend training in Research Data Management ✔️
Data management plan (DMP)
- To be written within the first 12 months
- approval during go/no-go meeting
Data/code archiving requirement
- Research data and code supporting the PhD thesis must be uploaded to a repository before graduation

No, seriously… why?

Workflow efficiency!

Early organisation and proper documentation of project and data

Prevent data loss… you never know

Training schedule

Part 1

Intro to research data management
- before, during, and following the research
- policy requirements
- (processing personal data)
Hands-on experience with DMP(online)

Part 2

Discussing with supervisor (and data steward) and completing the DMP

Part 3

Summary of part 1
Re-evaluation of DMPs
Archiving and publishing data and code

Training outcomes

The importance of proper data
- management
- storage
- documentation

Practical implementation of a DMP
- using DMPonline
Personal data considerations

Before the project

This illustration is created by Scriberia with The Turing Way community.
Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

The plan!

GIPHY

Research data lifecycle

Data management plan (DMP)

Document describing what will happen with…

code
data

…during the project, including

storage and backups
who has access

It also determines what happens to code/data after the project

if it will be publicly available
how long data will be stored for future (re)use
who will have access to the data in the long term

Research data

…. data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or other research output is based. It relates to data generated, collected, or used, during research projects, and in some cases may include the research output itself. Data may be numerical, descriptive, visual or tactile. It may be raw, cleaned or processed, and may be held in any format or media. Research data, in many disciplines, may by necessity include the software, algorithm, model and/or parameters, used to arrive at the research outcome, in addition to the raw data that the software, algorithm or model is applied to.

- Queensland University of Technology

DMPonline

DMPonline is an online platform for creating a DMP

You can log in with your NetID to the TU Delft DMPonline

Here, you can:

Write and update your plan during research
Select from multiple templates (TU Delft, NWO, etc.)
Share your plan with collaborators/supervisor
Request feedback from your faculty data steward
See other plans shared by researchers at TU Delft

For more instructions, see here.

During the project

This illustration is created by Scriberia with The Turing Way community.
Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Data storage

Storage options at TU Delft

https://tudelft.topdesk.net/ > ICT services > IT for Researchers

TU Delft ICT Network drives
- Personal Drive (H:)
- Staff Group Data (M:)
- Project Drive (U:)
Cloud Storage
- SurfDrive
- TU Delft OneDrive

TU Delft ICT Network drives

These are accessible from the TU Delft network (e.g. via Windows File Explorer)

Location	Storage	Access	Suitable for confidential data?
Personal Drive (H:)	8 GB	Just you	Yes (but not research data)
Staff Group Data (M:)	50 GB	Department	No
Project Data (U:)	5+ TB	Managed by drive owner (project PI)	Yes

Need more computing power?

DelftBlue Supercomputer

Cloud drives

Location	Storage	Access	Suitable for confidential data?
SURFDrive	1 TB	Just you (can share files/folders)	Yes
Microsoft OneDrive	1 TB	Just you (can share files/folders)	Yes

Project Drive storage

Recommended for project data

TU Delft ICT Network Drive

regular backups
- standard: 14 days retention
- high: 14 days + 53 weeks with redundancy (upon request)

Request form

Should be managed by project leader

access can be given to internal and external users

Pros:

Data is accessible when researcher leaves lab
Up to 5TB storage (more upon reasonable request)
Can request for multiple projects

Access outside TU Delft

Accessed through eduVPN

and

WebDrive/WebDav Web links

Cloud storage

SURFDrive

1 TB storage for TU Delft researchers
30 days backup
local desktop application, synchronised with cloud
Flexible sharing possible with external collaborators
Cross-platform (Linux, MacOS, Windows)

OneDrive

1 TB storage for TU Delft researchers
30 days backup
local desktop application, synchronised with cloud
Flexible sharing possible with external collaborators
Automatic version control

Pros-cons

TU Delft ICT Network drives

✔️ Regular backups (3-2-1 rule)
✔️ Secure storage
✔️ High volume storage (5TB+)
❌ Needs internet connection (and EduVPN)
❌ Can’t share single files/folders (all or nothing)

Cloud drives

✔️ Synchronisation with local device
✔️ Version controlled
✔️ Advanced sharing
❌ Not secure, not appropriate for sensitive/personal data
❌ Not compliant with 3-2-1 backup
❌ Account deleted shortly after researcher leaves

Backups

3-2-1 backup rule-of-thumb

3 copies of the data (1 primary, 2 backups)

2 different storage media (e.g. external hard drive and laptop)

1 copy stored offsite (different geographical location)

Backup types

Two-way (synchronisation)

One-way (backup)

Backup types

Full backups

+ Easy to restore data
- Uses a lot of storage

Incremental backups

+ Uses less storage
- More complicated to restore data

Project/data organisation

Look familiar?

Can you walk away from your project for days, weeks, months,

and come back and know what everything is?

Spend some time thinking about how you will organise yourself.

Your future self will thank you…

Project structure

Projects should be contained within folders in a meaningful place

📁 project_name
    📄 README
    📁data
      📄 raw-data_exp01.csv
      📄 raw-data_exp02.csv  
    📁analysis
      📄 analysis-script.R
    📁reports
      📄 results-of-analysis.Rmd
    📁publication
      📄 manuscript_v1.docx

✔️ Could be in home directory (~/Documents/Project_name)
✔️ Could be in the cloud (OneDrive/Project_name)
❌ NOT on the Desktop