”I go through all my data and remove unnecessary duplicates” – Sanja Hakala’s Data Cleaning Week challenge

University of Helsinki researcher Sanja Hakala challenges herself and all her fellow PhD researchers to join Data Cleaning Week by checking out their data management practices. UH Data Support’s Data Cleaning Week will be held from 16th to 20th December 2019.

PhD researcher Sanja Hakala (TUHAT) works in Evolution, Sociality & Behaviour research group at the University of Helsinki. She takes up the challenge and participates in the Data Cleaning Week. In this interview, Hakala talks about data management related to her research and her Data Cleaning Week challenge.

What kind of research do you do? With what kind of data do your work with and how much data do you have?

*”These buggers are what my data is made of, in one form or another…”*

SH: ”I am just finishing a PhD project on evolutionary biology. My questions deal with the interplay of dispersal evolution and social evolution in ants, and my methods are rather variable – leaving me with plenty of different types of data. Although all my final datasets are rather small and mostly stored as csv-files organized neatly in folders, managing the raw data poses bigger problems. My raw data spans from electron microscopy images to field and lab notebooks, from sequencing and genotyping files to behavioral observations as both videos and special files of the scored behaviors. Videos take the most space: I have a few hundred gigabytes of those.”

What kind of challenges have you faced in data management? How do they affect your work?

SH: ”To be honest, my biggest problems start already in the field. Managing field notes is the hardest part – at least mine are super detailed but often messy, and going through them later is a pain. In the field, I store the GPS coordinates electronically but make notes by hand, and have not always combined these two sources of information immediately. Noticing such sloppiness a year later does not make a researcher happy… I also wish I would have named my study populations and samples more systematically, because it would really help the later steps of data management.”

*”…and this bugger is responsible for the messy notebooks and files!”*

”Although I am super careful with my electronic data, and it is well-organized in general, I know I have too many duplicate files of the exact same data in too many different locations. Backing up would make more sense if it were more systematic! Additionally, sharing data with several co-authors has sometimes been a bit problematic. We have not been using cloud services as much as we should have, and as a result different co-authors have sometimes ended up with different versions of the same data.”

How would you challenge yourself to improve your data management during Data Cleaning Week?

SH: ”I challenge myself to go through all my data and to remove unnecessary duplicates. I am soon leaving the University of Helsinki and moving to somewhere else, so this would be a good time to make sure I leave all my data for my supervisors in a clear format with proper metadata. I should also scan all my field notes for the university so that I can take my original notebooks with me.”

Who do you challenge to join to improve their data management during Data Cleaning Week?

SH: ”I challenge all fellow PhD researchers. I feel that we are often a bit clueless about good data management practices, and I have heard some horror stories about losing data on crucial moment of the PhD project. Thinking about how to leave the data behind when we move on is also important. Poke at your supervisors and make a plan together, if you do not have one!”

Participate in Data Cleaning Week!

Take the challenge and participate in the Data Cleaning Week by posting a picture of your cleaning effort to Twitter by using the hashtag #5sdata. The picture can be about folder structure, new file naming system for your group, space created by cleaning – you name it! Remember to challenge your colleagues too!

You can also participate in Data Cleaning Week by sending an email to Data Support: datasupport@helsinki.fi. We convey your message through the Think Open blog, Twitter and Helsinki.fi web site.