Research with integrity – all the fun of the FAIR - Cancer Research UK

This post is 2 years old, so some information may be outdated

Data, data everywhere… it’s testament to the incredible work you do, but is it making research life harder? Maybe, say Andrew Porter and Katarzyna Kamieniecka, but with careful application of the right data principles it’s also making science better. Here, they take us through the why’s and wherefores of FAIR data…

This entry is part 9 of 17 in the series Research Integrity

There’s an online version of a document written in 1989 by Sir Tim Berners-Lee when he was trying to help researchers at the physics institute CERN keep track of their data while dealing with high researcher turnover.

In it, the authors state: “When two years is a typical length of stay, information is constantly being lost… The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.”

If that still sounds like the typical research institute of today, it does suggest Sir Tim wasn’t entirely successful in solving this issue. Can it be that the problem of managing research data is harder than inventing the Internet – the ultimate outcome of the ‘hyperlinks’ solution he proposed?

FAIR enough?

If anything, the problem has become harder over time. Due to the rapid pace of change in information technology, most cancer researchers – indeed most scientists – are now data scientists in a way undreamt of 40 years ago.

Results and experimental details can be spread across handwritten notes, emails, presentations, spreadsheets, documents with file names ending in something like ‘FinalFINAL.doc’, R scripts and more. This data is often locked away in systems and formats that aren’t easily shared, can’t be accessed by others and are impossible to understand except for those who created them; and only then while they retain important linking details in their memories.

These challenges are at the heart of the concept of FAIR data. You’ll likely know it’s an acronym for data that follows these four foundational principles – Findability, Accessibility, Interoperability, and Reusability (…see what they did there?). By adhering to FAIR principles, researchers can increase data reproducibility, transparency, and collaboration in science, helping ensure that research data is managed efficiently and in compliance with best practices. This is very much in line with the commitments in the Concordat to Support Research Integrity.

FAIR’s fair

What is FAIR data?
The FAIR (Findable, Accessible, Interoperable, Reusable) data emphasize machine-actionability. The main objective of FAIR is to increase data reuse by researchers. The core concepts of the FAIR principles are based on good scientific practice and intuitively grounded.
Why do we need it?
To ensure fairness, inclusivity, and transparency in research, promoting better insights and avoiding bias.

FAIR was first outlined in a 2016 paper that identifies a problem very similar to that faced at CERN in 1989:

“We often need several weeks (or months) of specialist technical effort to gather the data (because) we do not pay our valuable digital objects the careful attention they deserve when we create and preserve them.”

The FAIR principles were quickly adopted by funders like CRUK and research institutions as a framework for ensuring data is well-managed. Many researchers – particularly those who specialise in managing extremely large data sets – are well-versed in FAIRification (the process of making data FAIR).

So why are we still seeing so many of the same issues with managing data?

All’s FAIR

While researchers are often positive about the concepts of FAIRness, there are barriers to its application. The first barrier is learning about FAIR. Only a third of respondents to ‘The State of Open Data 2022’ survey were familiar with FAIR, with another third saying they hadn’t heard of it at all. Part of the reason for this article is to raise awareness. It can be easy for those well-acquainted with FAIR to skip over the basics, or rush into deeper complexities, when explaining FAIR and the many new terms and concepts associated with it. Finding the right starting point is key to avoid being overwhelmed at the very start (see “What next with FAIR?” below on suggestions for beginners).

Another barrier is finding accessible ways to engage with FAIR principles when there are so many other demands on researchers’ time. If you’re just starting, don’t feel you have to read everything about FAIR data. Even a little knowledge can reveal something practical and applicable that can be useful to you right now. Taking steps to make your data more organised and well-annotated – perhaps using a template for recording your experiments that reminds you to record all the important details each time – is likely to benefit ‘future you’ when you access and reuse your own data, without having to rely on memory to find and understand everything.

For those more immersed in FAIRification, perfectionism can be a barrier. Some people start crafting plans to redesign whole systems, reformat workflows and retrain their teams in emerging best practice. While these are positive directions and make great long-term objectives, they also require lots of time and energy to fulfil. Breaking these down to more accessible elements with a mixture of short- and long-term goals – including sharing what you have learnt with others – can be helpful.

What next with FAIR?

Beginner: Take a look at FAIR in a nutshell for a 10 minute overview, and the great introduction – complete with brief videos – here. Investigate your institutional support around FAIRness – try searching for “data stewardship” or “open data” resources, or approach your library if you are part of a university.

Intermediate: Find detailed resources on becoming more FAIR here, get hands-on with the training available or take the FAIR Pointers course here.

Advanced: For guidance on how to engage with FAIR as a researcher, developer, trainer, publisher and more, see https://fairsharing.org/. Want to share your knowledge? Help someone else navigate being more FAIR here or consider running the 90 minute “Coffee, Biscuits and Data” course.

If you are interested in enhancing research data management skills, take a look at the ELIXIR-UK Fellowship – everyone is welcome to join and explore.

FAIR play

Given the time commitment and effort involved in even relatively small changes, it can be helpful to remember that FAIRification of data is something valued in the wider research community and by funders. Indeed, CRUK have identified the need for cancer data to be more FAIR in their latest data strategy. Making data FAIR is often a condition of grant funding, and could possibly contribute to reporting on People, Culture and Environment for REF2028.

Taking a look at your institution’s policies on data management may also reveal resources, research data management training and other support to help make data FAIRer – often coordianated by libraries in UK universities. You may even find your institution has pots of funding to run FAIR and Open data projects which you might be able to tap into to run training events or develop new tools. Where funders like CRUK have put FAIR data at the heart of their research data strategies, so incorporating FAIR principles into funding applications is a positive way of showing alignment with the funder’s principles.

Ultimately, FAIR provides a framework for good research data management with a wide range of benefits.

Author

Dr Andrew Porter is Research Integrity and Training Adviser at Cancer Research UK Manchester Institute

Author

Kate Kamieniecka is a former CRUK Manchester Institute bioinformatician who recently joined the University of Bradford and holds the position of Lead Data Steward Trainer in the ELIXIR-UK Fellowship

Cancer News

Research with integrity – all the fun of the FAIR

FAIR enough?

FAIR’s fair

All’s FAIR

What next with FAIR?

FAIR play

Dr Andrew Porter is Research Integrity and Training Adviser at Cancer Research UK Manchester Institute

Kate Kamieniecka is a former CRUK Manchester Institute bioinformatician who recently joined the University of Bradford and holds the position of Lead Data Steward Trainer in the ELIXIR-UK Fellowship

Highlighted content

More like this

Data science: making data count with broad consent

When relationships turn sour – cancer and the holobiont

Research careers - pausing but not stopping for a flexible career