Skip to main content

Together we are beating cancer

Donate now
  • For Researchers

Research with Integrity – the pitfalls and potential of generative AI

The Cancer Research UK logo
by Cancer Research UK | Analysis

26 March 2025

1 comment 1 comment

Integrity

A few years into the generative AI revolution, Andrew Porter looks at where we are now with these new tools, and what researchers need to think about before diving in…

 

This entry is part 15 of 15 in the series Research Integrity
Series Navigation<< Research with integrity – time to recognise postdocs

As universities wrestle with how to address the impact of generative AI on many levels, what does it mean on an individual level for a postdoc or PhD student in a lab interested in using generative AI to support their work?

If this is you, perhaps one simple question to ask yourself is: do you want to lean into generative AI or keep your distance from it?

Benefits and limits

To lean in, you’ll want to invest your time in understanding how best to use these new tools – thinking deeply about the implications in areas such as copyright and plagiarism, confidentiality and data security, ethical practises and environmental sustainability – and figure out how the concerns and opportunities in those areas interact with your research.

Or, perhaps, now is not the time for you. And this isn’t simply a scepticism or distrust of the technology but more a weighing of the cost/benefits at this particular time, for you, in your work.

If you see an opportunity for generative AI to benefit your research – then I would suggest it's hard to do this well without going into the deeper ethical and practical questions.

There’s no doubt that some people are benefiting from using generative AI. I’ve personally heard from people using it to help with coding and bioinformatics, as a starting point for learning about new subject areas and for cutting down large amounts of text to make it more digestible and readable. There are also people who are embedding generative AI directly into their research processes, for instance to help with analysing large data sets.

If you do see an opportunity or potential for generative AI to benefit your research –  then I would suggest it’s hard to do this well without going into the deeper ethical and practical  questions. If you are leaning in, it’ll be worthwhile – maybe even essential – to really understand the mechanistic basis of Generative AI and Large Language Models (LLMs), their strengths but also their limitations.

If in two years’ time your thesis or paper will contain research methods that have used generative AI today, then – as your work will need to follow the commitments in the Concordat to Support Research Integrity and the codes of practice of research organisations in the UK that flow from this – it’ll be your responsibility to ensure that you keep generative AI under careful consideration, use it according to the highest standards in your field, and seek ethical approval where necessary. That describes the sort of package of evidence you will likely need when questioned by an examiner in your viva, or when signing declarations with a publisher.

However, that is a considerable amount of work to do well. And while generative AI is a fascinating topic, if it’s not essential to your work then maybe you are better investing that time, energy and thought into something else.

Gen AI

If not now, when?

I’m not saying generative AI will never be useful to you, but the question is: do you need to use this now?

There may well be a point where generative AI tools reach market saturation and most of us use them daily in the way we use spreadsheets and email. But we are not at that point. If an organisation like Apple is not able to make their generative AI-based news app work properly (requiring a very public recall); if people reviewing CVs and cover letters are reporting on social media that they can very much tell which ones are written using AI, and if people in criminal law cases are still referencing fake cases based on hallucinations from generative AI, I think it’s clear that this technology has not yet reached a final, stable form.

The hype level in the generative AI news cycle is extremely high. Most of it is driven by the companies who have invested enormous sums in making these products.

I am labouring this point because one of the biggest issues I believe researchers are facing with respect to generative AI is FOMO – the fear of missing out. The hype level in the generative AI news cycle is extremely high. Most of it is driven by the companies who have invested enormous sums in making these products. It’s in their interests to contribute to a sense of urgency in getting people on board with using generative AI.

Other contributing – but possibly linked factors – include government-level calls for incorporating Generative AI, along with examples of researchers themselves sharing how they are using these tools. But these are mainly those who are ‘leaning in’ to generative AI, spending time learning the software, adapting it to their own needs, using it in creative ways, to receive that benefit. But this comes at a cost in terms of time, energy and learning.

Therefore, I suggest that if you are in the position of having tried to use generative AI but still see a long road ahead to beneficial proficiency, that stepping back from using it at this time may be a completely rational approach.

Questioning your place on the AI spectrum

I’m conscious that I’m presenting this as a dichotomy – that, clearly, is an oversimplification. So, if you fall somewhere in the middle or you’re not sure where you sit, let me ask you five broad questions to help you find a way through. Most universities are developing or have developed guidance for their researchers, including training and frameworks for responsible use of generative AI, so I would always advise researchers to be aware of and align with their local guidance. But hopefully these questions are complementary and can help you think through the process.

  1. Unknown unknowns

Given that generative AI can create content which is out-of-date, out-of-context, or simply made-up, if you want to use generative AI in an area that is outside of your current expertise, on what basis will you judge whether or not its responses are accurate, complete, sensible, useful or ethical?

  1. Reversion to the mean

I asked a version of Chat GPT to tell me a joke, and ran the prompt 20 times in fresh chat windows. Instead of producing 20 different jokes, it told one of the two different jokes (for those interested they were “Why don’t skeletons fight each other? They don’t have the guts” and “Why don’t scientists trust atoms? They make up everything”). I could have written a more complex prompt to create more variety, but the point of the illustration is that what may look original and interesting if you do it once looks less interesting when you see it over and over.

What’s often happening is that the statistical properties of the large language models are creating a sort of “reversion to the mean” in terms of writing tone. Sometimes, this is the desired effect, perhaps to produce a business-like, impersonal tone for a complaint email. But in instances where you are writing to convey something of your unique approach and talent – perhaps a grant application or covering letter – how can you ensure your distinctive voice is not being overwritten by a generic one?

  1. Long-term consequences

Large language models are created from huge training sets, which give rise to ethical and legal concerns about the sources of the information within them. There are multiple legal challenges from writers and publishers whose content is being used for training generative AI, and if successful these challenges could have a big impact on the tools that are available or the way in which we use them in the future. Given that many chatbot models are closed – i.e. the training set is not declared, and the weightings and refinements used to generate responses not known – it’s hard to know how these legal challenges may affect users. Recent EU legislation, which the UK may follow or adapt, is changing the responsibilities for users of generative AI. If access to generative AI changes due to new legislation or legal challenges, how might that compromise your research if you have already embedded a large language model into your research processes?

“If access to generative AI changes due to new legislation or legal challenges, how might that compromise your research if you have already embedded a large language model into your research processes?”

  1. Representation

An additional issue with the training sets used to create large language models is that they are biased towards content which is online, available in English, and from the relatively recent past. This means outputs may be incomplete, compromised or under-represent attitudes or ways of thinking that are important to particular people or groups. The question becomes: whose voice, learning, perspectives and values are not present in the training set, and what difference might that make to the outputs?

  1. Security

The National Centre for AI suggest “users should never put personal or private information into systems aren’t provided by their institution”. The launch of the Chinese-based Chatbot DeepSeek, and the many headlines it generated, showed that people will go and try out new tools even if their privacy polices specifically state that the company will use the inputs to train their system, and that data will be stored in a country which is outside of UK regulatory frameworks and oversight. New questions are emerging regarding generative AI inputs and prompts in regards to freedom of information requests, such the recent release of prompts from a government minister. This highlights the point that to use the technology requires a lot of work to ensure it’s done responsibly. This will include producing risk assessments, talking to trusted research and information governance teams at your organisation, reading the legislation and keeping up-to-date with changes. Before getting into that, perhaps a much simpler question to ask is: what would be the cost of compromising your ideas and data, or those of your colleagues and collaborators?

Hopefully these sorts of questions can help you do a relatively quick triage to determine whether you are going to ‘lean in’ to generative AI or keep your distance for the time being. If you are going to ‘lean in’, then do so in a wholehearted way, seeking out the information, training and support needed to use it responsibly. And I am looking forward to seeing all the fantastic use cases that come from thoughtful, dedicated researchers adopting this new technology.

But if the concerns and questions here are big enough, I think it’s a perfectly sensible choice to actively choose not to use generative AI at this time and spend your energy on some other important part of your research.

Dr Andrew Porter

Author

Dr Andrew Porter

Andrew is Research Integrity and Training Adviser at Cancer Research UK Manchester Institute

    Comments

  • Naomi
    26 March 2025

    This is a really balanced piece with a lot of good advice. My concern is that if you are a researcher in academia choosing not to use GenAI that will have an impact on career progression in a hyper-metricised system that is output focused. Using it means being able to create more journal papers, more grant applications, more conference presentations etc in less time, and those outputs are rewarded.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.

    Comments

  • Naomi
    26 March 2025

    This is a really balanced piece with a lot of good advice. My concern is that if you are a researcher in academia choosing not to use GenAI that will have an impact on career progression in a hyper-metricised system that is output focused. Using it means being able to create more journal papers, more grant applications, more conference presentations etc in less time, and those outputs are rewarded.

Tell us what you think

Leave a Reply

Your email address will not be published. Required fields are marked *

Read our comment policy.