Text and Data Mining and its value to the UK economy

Text & Data Mining
|

Commissioned by

Executive Summary

The current debate around text and data mining (TDM) has focused on the relationship between rights holders and technology companies. However, beyond these two important groups there is a need to better understand how TDM is being used across the wider economy.

Public First was asked by Microsoft to examine current levels of TDM activity in the UK. To do this we have surveyed UK businesses and modelled the potential economic impact that different policy scenarios could have over the next decade.

Our research combines a nationally representative survey of 1,000 UK businesses—including deep dives into two critical sectors for the UK economy, financial services and life sciences. We have used a task-based economic model that estimates how four different TDM policy scenarios could specifically affect AI adoption and its contribution to GDP growth by 2035. Here is what we found:

Text & Data Mining (TDM)

+ 0

UK businesses using specialised TDM tools

One in five businesses across the economy (19%) now use specialised TDM tools to gather insights, develop new products, and drive growth. That is equivalent to over 1 million UK businesses using specialised TDM tools, more than the total number of businesses in London (983,000).¹

Businesses engaged in TDM

0 %

of TDM users say AI is critical to their ability to compete

There is a relationship between the use of TDM, and a growth mindset. Our survey shows that businesses using TDM are more likely to prioritise all forms of business investment than those who do not, including in new AI technology. 57% of TDM users say AI is critical to their ability to compete.

In business sectors important to the UK’s Industrial Strategy TDM use is significantly higher than average. Over a third of firms in life sciences (34%) and financial services (33%) are actively mining and analysing data to stay competitive.

TDM users rely on a blend of

0 %

of businesses performing TDM said access to external data was essential for their business

Financial services firms use TDM to monitor transactions, detect fraud, and analyse market insights—drawing on customer communications, Know Your Customer (KYC) documents, market feeds, and a range of media sources. Life sciences companies apply TDM to accelerate clinical trials, support regulatory submissions, and extract insights from clinical notes and genomic datasets.

External data is a critical input for TDM users. 74% of businesses performing TDM said access to external data was essential for their business and was seen as more important than internal data (59%).

The next 2-3 years will be critical

0 %

of businesses want to move from basic to advanced integration of AI and cloud technologies

AI use across businesses is high. 75% of small businesses, 87% of medium sized businesses and 91% of large businesses have used AI tools, at least once. However, much of this is still only ad hoc or basic integration.

56% of businesses want to move from basic to advanced integration of AI and cloud technologies over the next 2-3 years. But there are risks to this progress; across the economy AI investments are seen as riskier than other forms of investment, and 39% of TDM users cited legal risks as a barrier to further AI adoption.

British businesses want access

0 %

OF BUSINESSES USING TDM SAID THEY EXPECT A POSITIVE IMPACT ON THEIR COMPETITIVENESS IF THE UK KEEPS PACE WITH THE USA

77% of businesses using TDM said they expect a positive impact on their competitiveness if the UK can keep pace with leading markets like the USA when it comes to access to data and new AI technologies. This view was consistent across sectors: 81% of life sciences firms and 77% of financial services firms using TDM expressed optimism about matching US-level AI innovation.

In contrast, sentiment dropped sharply when businesses were asked about scenarios where the UK only kept pace with the EU or fell behind both the US and EU. Well over a third (41%) of TDM users across the economy expected a negative impact on their competitiveness if the UK does not keep up.

The economic impact of the different choices

Public First’s economic model draws on precedent from recent academic and industry research to estimate how changes in TDM regulation could affect AI adoption and its contribution to GDP growth by 2035.

Our model considers the impacts on access to and the development of new AI products and services, access to data for training and fine-tuning and the impact on AI inference.

Four plausible policy scenarios were modelled:

Scenario 1:

A full commercial TDM exemption without the introduction of a rights reservation or opt-out

Scenario 2:

A commercial TDM exemption where an opt-out based on an industry code of practice is introduced

Scenario 3:

A commercial TDM exemption with a rights reservation and transparency requirements

Scenario 4:

An opt-in model requiring licences for all copyrighted content

Our model predicts that by 2035 AI adoption could be contributing

Under the most innovation forward scenario (scenario 1). However, this falls to £290bn under the most restrictive (scenario 4), representing a loss of £220bn, or 43% of the gain that AI could bring to the economy by 2035.

£220 billion is around the same size as the GDP of Scotland in 2024, including oil and gas revenues, ² and is equivalent to total NHS spending in 2024/25.³

Even under scenario 3 which includes a full commercial TDM exception with a rights reservation and transparency requirements, the UK loses £60bn. In this scenario the UK would match the EU’s approach but would be less competitive than the US and Japan.

Public First’s economic modelling

AI’s contribution to sector GDP in 2035 – most optimistic TDM outcome (scenario 1)

Loss of AI driven GDP growth in most pessimistic TDM outcome (scenario 4)

Percentage loss in GDP contribution

Total economy

£ 0 bn

-£ 0 bn

- 0 %

By business sector:

Professional Scientific & technical activities

£ 0 bn

-£ 0 bn

- 0 %

Education

£ 0 bn

-£ 0 bn

- 0 %

Human health and social work

£ 0 bn

-£ 0 bn

- 0 %

Financial and Insurance activities

£ 0 bn

-£ 0 bn

- 0 %

Manufacturing

£ 0 bn

-£ 0 bn

- 0 %

To maximise the potential economic growth and competitiveness that AI and TDM can deliver, Public First recommends the UK Government should:

1.

Adopt a pro-data availability approach to commercial text and data mining, with policies that align with scenario 1 or 2 in our economic model, using Japan and Singapore as examples of best practice.

2.

Introduce enabling regulation quickly, particularly as the next 2-3 years will see companies across the economy move from basic to advanced AI and cloud integration

3.

Champion pro-data availability policies, such as incentivising data availability and supporting experimentation through tried and tested policies like regulatory sandboxes and encouraging regulators to have a pro-growth mindset.

What is Text and Data Mining
|

Introduction: the UK’s AI opportunity

AI will bring major changes to the UK and the global economy. The UK Government’s Industrial Strategy (Invest 2035) identifies AI and digital technologies as key to boosting productivity—by up to 1.5% annually—across strategic sectors like life sciences, financial services, and advanced manufacturing.

Every major sector of the UK economy now has a plan for growth that depends, in part, on greater use of AI and digital technologies.

In manufacturing, companies are using AI to improve quality control and reduce waste. In life sciences, it’s helping to design better clinical trials and identify new uses for existing medicines. In financial services, AI is being used to monitor risk and analyse changing market conditions, and in the public sector AI is seen by the Government as one of the most important tools available to help deliver better and more efficient public services, without resulting in an unsustainable tax burden.

5:1 return on investment

We estimate by investing more in digital skills and infrastructure, AI deployment across the economy could bring about an average societal Return on Investment of over 5:1 in the next decade.

As part of its aim to support AI innovation and diffusion, the UK Government is proposing to clarify the law around text and data mining (TDM). This policy discussion has attracted fierce debate between rights holders and technology companies, and there exist differing views on the best course of action for the UK.

The Government has called for more evidence to understand the impacts that different policy choices could have on the economy, UK AI companies and on rights holders.

The choice the Government makes over TDM will have consequences for the whole economy as businesses continue to adopt new general purpose AI technologies and specialised digital and AI applications specific to their sectors and needs.

Within this context Public First was asked by Microsoft to examine current levels of TDM activity across the UK economy and to estimate the impact that different policy choices could have on economic growth over the next decade to 2035.

Our research combines economic modelling and a survey of 1000 UK businesses, aiming to better inform this policy debate by exploring how different choices over TDM policy could impact the whole of the UK economy.

How does Text and Data Mining relate to modern AI models?

The performance of modern AI models — and therefore their value to users and the wider economy — depends on access to data. To function effectively and to improve, these systems must analyse wide‑ranging material — from internal documents to public sources such as journal articles, technical manuals, commercial records and online content.

Learning for AI models means being able to derive patterns, structures, and relationships from across a broad body of data so that the models can predict and generate original content in response to user queries.

In other words, AI models generate responses based on statistical relationships that they have learned from training and optimisation processes. This helps determine the most contextually appropriate response when it is asked to perform a task by its user. The larger the scope of data used to train an AI model, the more accurate and robust the response.

The process of responding to a task is called inferencing. Inferencing describes the process by which an AI model receives an input to produce an output – such as being asked to predict and trend, analyse data or answer a query. During inferencing the model’s internal parameters are not updated based on this new information, instead the AI model applies its learned patterns alongside new information to make decisions in real time.

Additional data is often key for inferencing. While it is not required for inferencing to take place, the model will frequently either be given extra input data by its user, or it will seek to gather it from the internet or other sources it has access to.

During training the AI model is fed data by the developer to help it learn, language and patterns.

This provides the basis from which the model will respond to tasks.

When asked to perform a task the mode begins inference.

During inference the model can reach out to access new data across different sources.

Using its learned knowledge as well as external data, the model analyses all the information it has access to in order to respond to the task it was asked to perform.

Better access to data during inference can improve the response the AI provides its user. The volume and quality of data that the model has access to at both the training and at the inference stages will determine how well it performs, and therefore its value to its user and the economy.

So where does text and data mining come in? Text and data mining is the process by which data is gathered and analysed.

TDM is important at the different stages of an AI’s life cycle and use. It is used to analyse and gather information when training new AI models as well as to fine-tune and improve existing ones. TDM can also be performed by AI. Here AI rapidly speeds up the process of mining and analysing data.

Case study: AI in Education – Northern Ireland’s Education Authority

The Education Authority of Northern Ireland (EANI) is using AI to help teachers save time and improve lessons. With AI tools, teachers can quickly create lesson plans, generate classroom resources, and prepare assessments, reducing paperwork and freeing up time for students.

With these AI tools teachers can also use AI to scan the web and other data sources to bring up-to-date examples into lessons, making learning more relevant and engaging. AI also helps tailor content for different needs, including pupils with special educational requirements.

EANI has focused on training staff to use AI responsibly, ensuring data security and avoiding bias. Early trials show strong results, and adoption is expanding across schools. The aim is simple: give teachers more time to teach and make learning more interactive and inclusive.

Text and Data Mining also Goes Beyond AI

TDM is a powerful tool used by many organisations to make sense of large amounts of information — even when AI isn’t involved. TDM helps turn messy, unstructured text into useful insights. Whether it’s used to improve patient care, guide investment decisions, engage in price comparison activity, perform fraud checks, or streamline operations, it’s a key part of modern data analysis.

TDM usually runs on cloud platforms, which allow businesses to process huge volumes of data quickly and securely. In place of AI models, they can use rule-based systems or statistical techniques to find trends, detect risks, or improve services.

Business gathers large quantities of data via text and data mining – internal, private and public data sources

Statistical techniques, via cloud computing (or now with AI) help spot and detect patterns in the data

Insights from the data help the business produce new products, strategise better, enter new markets

While AI is not required for TDM, increasingly it is being used to enhance traditional data analysis, allowing businesses to better analyse larger quantities of data more quickly and giving them the ability to more easily combine external data from the web or large databases with their own internal data to get even more detailed and complete insights.

Soon more advanced AI models will be able to automate this process, for example, automatically developing and testing hypotheses in scientific research allowing for a wide range of commercial applications, such as more rapid discovery of new medicines, materials and scientific advances.

Case study: Advanced Cloud and AI Tools in Financial Services – London Stock Exchange Group

London Stock Exchange Group has moved to cloud-based virtual desktops to give its global teams secure, flexible access to work systems. This change means developers can work from anywhere while keeping sensitive data safe. It also makes managing devices easier and reduces IT complexity, freeing up time for innovation.

By adopting cloud tools, the organisation can speed up software development, improve collaboration across regions, and meet strict compliance standards. These improvements help LSEG respond faster to market demands and maintain resilience in a highly regulated sector.

This example reflects a wider trend in financial services: using cloud and AI technologies to cut costs, boost security, and deliver quicker, more reliable services in a competitive market.

What does the law say

The law around text and data mining in the UK is not clear cut. However, a widely used reference point comes from the European Union’s Digital Single Market Directive. It defines TDM as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.

This captures the core function of TDM for AI: enabling systems to extract useful information from large volumes of data for training, fine-tuning and for inference.

It is important to be clear about how this process works, and what it does not do. AI models do not store or memorise the content they are trained on. Instead, they learn patterns by analysing vast amounts of information from training data and breaking it down into tokens. Tokens are small units (such as words, sub-words, or even characters) that the model can process.

Each token is assigned a numerical representation (referred to as embedding) that captures its meaning within a structured vocabulary. Embeddings are a way of transforming complex, human-readable data into numbers that a computer can process. In LLMs, words are mapped in a multi-dimensional space, and their place within that space is defined with a set of numbers — called a vector.

This allows AI models to generate new responses based on the trends and structures they have internalised through learning. A bit like how someone who is very good at cooking can turn their hand to making multiple dishes based on a few ingredients, without the need to memorise and keep replicating a specific recipe found in a cookbook.

As such the model does not retain entire documents or reproduce full works. Rather, it builds a generalised understanding by processing many examples during training and then responds dynamically when asked to perform a task.

Material that AI models use for training and inference can be protected by copyright. However, in many use cases copyright rules are not clear and there is an ongoing debate over how the rules around text and data mining should be structured.

The UK’s Legal Framework on Copyright and TDM

In the UK, the rules on text and data mining (TDM) come from section 29A of the Copyright Act. They allow people to copy material for computational analysis — but only if they already have lawful access to it, and only for non-commercial research.

This means that using copyrighted material for commercial purposes, such as training AI systems, or developing new products usually requires permissions from rights holders. However, what counts as “lawful access,” “research,” or “non-commercial” isn’t clearly defined, which makes it difficult to know what’s allowed. Companies are often very risk adverse and as a result this stalls the use of TDM for and by AI applications in the UK.

The Government is looking at updating the rules to make them clearer and better suited to modern technology; however no final changes have been agreed. For now, the UK’s legal position on TDM remains narrow and hard to navigate.

The lack of clarity in UK law around TDM has several knock-on effects.

Firstly, it means that very few AI models are trained in the UK, with the majority being trained in the United States. Research from CCIA and concerns raised by the Startup Coalition finds that under the current legal framework UK companies can be tempted to relocate to the USA if they want to engage in heavy AI training.

0 %

say that if the UK does not seek to create a competitive regime for text and data mining on publicly available data then they would likely reconsider the UK as a destination for AI investments.

CCIA survey of 500 UK AI developers, investors, and others working in AI in the UK

When it comes to TDM and inference being performed by AI models, the lack of clear rules can mean it is messy for AI and TDM users to know exactly what they can do when creating new commercial products and applications.

Some in this debate have argued that even during routine inference copyright rules should be more heavily enforced.

This could reduce the usefulness of AI services in the UK as even for routine everyday tasks models would need to navigate strict access rights.

The effect of this would be that companies using routine inference on common AI products in the UK (such as Copilot, Chat GPT, Gemini, Claude, Perplexity) would see worse results than businesses using the same products but in other countries with more permissive rules.

What is Happening in the Rest of the World?

The importance of TDM is already being realised across the world. Many other countries are taking active steps to ensure that their legal frameworks enable businesses across their economies to get the most out of AI tools.

Other countries moving to create more permissive rules for TDM creates a competitiveness risk for the UK. Without keeping pace, businesses in other jurisdictions (US, Japan, Singapore) are likely to have greater access to a wider variety of AI models and much greater legal clarity on the use of TDM for commercial purposes.

In more extreme scenarios the performance of off the shelf AI models available in the UK could be lower than the same products in other countries. This can be attributed to greater access to a wider range of data when performing routine inference as well as TDM.

Jurisdiction

United States

Japan

Singapore

European Union

Default Copyright Position

Exclusive rights over reproduction, publication, performance.

Automatic copyright protection under harmonised directives.

TDM Framework

Evaluated under fair use doctrine

Article 30-4 allows “information analysis” without permission.

Singapore Copyright Act allows TDM

Articles 3 & 4 of the Copyright and Related Rights in the Digital Single Market (CDSM), Directive (2019) govern TDM.

Implications for AI

Fair use has permitted technologies like search engines, digitising and making books searchable, text analysis software, tools for detecting plagiarism, and much more. In these and other cases, courts have found that the alleged infringing activity is highly transformative – rather than communicating the original work to the public, these uses create new tools and merely copy the work as an intermediate step. While there is extensive ongoing litigation of how fair use applies to generative AI, two federal courts have ruled that AI developers engaged in fair use when training their models.

Allows for broad legal certainty for TDM provided the purpose is to extract information and not to enjoy the expressive content of the work. Outputs must not replicate or “clone” original works. This framework has led many to see Japan as a pro-innovation jurisdiction.

It is permissible to make a copy of or communicate a work or a recording of a protected performance for the purposes of computational data analysis or preparing the work or recording for computational data analysis.

This opt-out gives rights holders the opportunity to reserve their rights from commercially based reproductions being made for the purpose of TDM activity.

Jurisdiction

United States

Default Copyright Position

TDM Framework

Evaluated under fair use doctrine

Implications for AI

Jurisdiction

Japan

Default Copyright Position

TDM Framework

Article 30-4 allows “information analysis” without permission.

Implications for AI

Jurisdiction

Singapore

Default Copyright Position

Exclusive rights over reproduction, publication, performance.

TDM Framework

Singapore Copyright Act allows TDM

Implications for AI

Jurisdiction

European Union

Default Copyright Position

Automatic copyright protection under harmonised directives.

TDM Framework

Articles 3 & 4 of the Copyright and Related Rights in the Digital Single Market (CDSM), Directive (2019) govern TDM.

Implications for AI

This opt-out gives rights holders the opportunity to reserve their rights from commercially based reproductions being made for the purpose of TDM activity.

What Options Does the UK Have?

The UK Government undertook a consultation on how/whether it should change its rules around TDM. This provided a detailed illustration of where different stakeholders stand on various policy positions, and the economic impact assessment and broader report are due out mid March 2026.

To help inform this debate Public First has undertaken detailed surveys of UK businesses and run economic modelling to better inform the UK Government and stakeholders about the potential choices the UK could make.

For this analysis we have surveyed over 1,000 UK businesses exploring their current text and data mining practices, how this supports their business and their investment plans when it comes to AI and new technologies in the future.

Our survey included samples of 264 UK financial services sector companies and 264 life sciences companies to understand the role that TDM plays in these two important strategic sectors for the UK, as well as a sample of 526 businesses across the whole economy.

We have created four plausible scenarios, covering the range of options that the UK could take given the ongoing consultation and policy debate, and with reference to the choices other countries have made.

Using a task-based economic model we have estimated the potential impact of each of these options and the effect they could have on AI adoption across the economy. This work aims to explore and show the wider economic implications of the different choices the Government could make and what that means for the UK’s future growth.

Four scenarios that could play out in the UK for Text and Data Mining Policy

Scenario 1:

A commercial TDM exemption is introduced without an accompanying rights reservation or opt-out.

Policy environment:

TDM is allowed for all purposes, including commercial use.
No opt-out or rights reservation would apply.
Once content is accessible, it would be available for AI training as well as analysis via specialised TDM tools or inference.

Effects on UK businesses access to AI and use of TDM:

UK businesses would gain access to the latest AI models at the same rate as leading markets such as the US.
Businesses would have the greatest opportunity to build domestic models. UK companies would also have the greatest opportunity to fine-tune existing models on UK data.
More generally, UK companies would be able to engage in the type of modern data analysis (whether using ML or AI applications) performed by their peer companies in certain countries (US, Japan, etc).
Available AI and TDM products would have maximum access to data for commercial purposes, supporting the development of new products and innovations.

Scenario 2:

A commercial TDM exemption is introduced alongside the creation of an opt-out based on an industry code of practice

Policy environment:

TDM is allowed for all purposes, including commercial use.
A widely recognised opt-out standard is created and backed by an industry code of practice.
This creates a standardised machine readable opt-out for content holders to express their preferences on whether they want their content to be used in AI training or other types of AI use cases.

Effects on UK businesses access to AI and use of TDM:

UK businesses would gain access to the latest AI models at the same rate as leading markets such as the US.
Businesses would have a globally competitive environment to build domestic models. UK companies would also have a conducive environment to fine-tune existing models on UK data.
Available AI and TDM products would have significant access to data for commercial purposes, supporting the development of new products and innovations.
However, there may exist some barriers to access to diverse data compared with Scenario 1 because of the creation of a widely recognised opt-out.

Scenario 3:

A TDM exemption with an opt-out and transparency requirements

Policy environment:

TDM is allowed for all purposes, including commercial use.
Authors and IP owners would be extended a “legally enforceable” right to opt-out of an exception with the reproduction of works in connection with TDM activity.
New obligations would require developers to publish information that pertains to their practices to comply with opt-outs in connection with TDM activity.
These could include summaries of training data sources, explanations of how opt-outs are respected, or information about downstream outputs.

Effects on UK businesses access to AI and use of TDM:

UK businesses would gain access to the latest AI models at the same rate as businesses in the EU.
Businesses would have just as competitive an environment to train new domestic models as businesses in the EU. But less competitive than in the US, Japan and other jurisdictions.
UK businesses would need to navigate opt-outs when fine-tuning existing models for their particular use cases. Compliance with an opt-out framework under the law may be brittle and not keep pace with the development of new technologies, thus presenting legal uncertainty about how compliance would take place in the future.
Available AI and TDM products would need to navigate opt-outs in ways that would limit access to data for commercial purposes. In turn, this may hinder the development of new products and innovations versus previous scenarios.
Transparency requirements that risk exposing confidential information and trade secrets could delay the availability of certain AI models in the UK market, reducing timely access compared to other jurisdictions.

Scenario 4:

An opt-in model requiring licences for all copyrighted content

Policy environment:

Under this model, TDM for commercial purposes would only be lawful where rights holders have given explicit permission, typically via a licence, or some other narrow exception may apply.
Access to content for AI inference would also be affected as an opt-in model creates a stronger legal basis for rights holders to challenge the use of content for analysis by off the shelf AI products.
UK firms may also avoid certain tasks with off-the-shelf models (such as uploading content for analysis) given legal uncertainty.

Effects on UK businesses access to AI and use of TDM:

UK businesses would gain access to the latest AI models at a delayed rate, slower than the EU and other countries.
UK firms would find it economically unviable to build domestic models involving the use of external data in the UK, especially large foundational models.
There would be significant restrictions on access to data for fine-tuning existing models given the cost to use data and an uncertain legal environment.
Available AI and TDM products would have almost no access to data for commercial purposes, unless access had been negotiated in advance, such as via a licence. This would hinder the development of new products and innovations via TDM and AI use, except for all but the largest and best resourced firms.
Routine inference tasks performed by off the shelf AI models are also affected, reducing the performance of off the shelf models in the UK when compared with other jurisdictions.

Public First’s |

This research combined a survey of over 1,000 UK businesses—including targeted samples in financial services and life sciences—with a task-based economic model to estimate the impact of different text and data mining (TDM) policy scenarios on AI adoption and GDP growth by 2035.

To ensure realistic estimates of TDM usage, the survey applied a three-step filter assessing business activities, relevance, and tool usage.

The economic modelling draws on precedent from academic and industry studies and incorporated assumptions about legal uncertainty, regulatory friction, and delayed AI rollout under restrictive scenarios.

Economic modelling:

Our main model exploring the economic impact of AI is a task-based model, following the precedent of Eloundou et al (2023), Microsoft / Public First (2024), and Felten, Raj and Seamans (2021).

As part of this modelling, we created new classifications that looked at the extent to which task augmentation in different sectors and occupations relies on fine-tuning, inference, RAG, and proprietary models, trained on both internal and external data.

We then built new stylised scenarios to explore how additional TDM regulation, cost and legal uncertainty would hold back AI use cases in this instance. In addition to this, in the most restrictive scenarios we also took account of legal chilling on delayed roll out for new features in off-the-shelf AI tools and assistants.

A full methodology note is available to download from our website here.

Business surveys:

Public First conducted an online survey of over 1,000 businesses with responses sourced from a mixture of online survey panels specialising in B2B sample provision. To qualify, participants were required to be:

based in the UK
working in private sector organisations
working in organisations with at least one employee other than themselves
have a job title equivalent to or more senior than “director”

Exceptions were made for the life-sciences booster sample where we allowed participants at senior manager level to take part as long as they met other quality criteria e.g. have decision-making responsibility or involvement in at least one of a range of relevant functions within their business.

Quotas were employed for company size (with an approximately even split of micro, small, medium and large businesses in the general sample), region and industry.

Industry quotas were based on Standard Industrial Classification (SIC) codes for manufacturing, accommodation, professional, scientific and technical, business administration and support services, health, and information and communication with natural fallout on other sectors.

Overall, the general business sample consists of 526 respondents.

The financial services boost sample is based on participants whose organisation falls into the “financial and insurance activities” SIC code and consists of 264 respondents, 54 of whom were part of the general business sample and 210 coming from the boost.

The life sciences boost sample is based on participants who said that their organisation falls into one of the following categories:

Pharmaceuticals manufacturing
Biotechnology
Medical devices and equipment manufacturing
Diagnostic technologies and services
Clinical trials and contract research services
Healthcare service provision (e.g. hospitals, clinics)
Digital health and health data analytics
Laboratory services and testing
Life sciences consultancy or specialist advisory services

This sample consists of 48 qualifying participants from the general business sample and a further 303 in the life sciences boost.

To improve the base size for some life sciences specific questions, a further boost of 153 respondents were added.

Estimating TDM

Text and data mining is a technical subject with a high risk of overclaiming. We therefore filtered results through three questions in order to produce a result that is realistic.

Question T1 asks whether the business conducts any of a wide range of data analysis related tasks, from the very complex to the very basic. 91% of participants in the general sample said that their business conducted at least one of these tasks.

Question T2 presented the following definition of text and data mining and asked the extent of any use or applicability within their business:

“Text and data mining means using software to automatically find patterns, trends, or insights in large sets of text or other data.”

Almost half (47%) of businesses answered that this was something their business did, with 26% saying it was used systemically and 21% on an ad hoc basis. A further 31% said that this was relevant but not currently used, while 20% said that it was not relevant to their business.

Question T3 asked that 47% which tools they make use of for TDM, ranging from everyday spreadsheet software like Excel to custom machine learning tools trained on company data.

To qualify for our “uses TDM” definition, participants had to select at least one of the following answers:

Text mining and natural language processing tools (e.g. RapidMiner, KNIME, MonkeyLearn)
Data Mining and Machine Learning tools (e.g. WEKA, Orange Data Mining, H2O.ai)

A total of 19% of businesses in the main sample qualified for this definition, compared with 33% in the financial services sample and 34% in the life science sample.

How do UK businesses
|

Businesses will begin using TDM via one of two main routes. Either because they start using off the shelf AI products that benefit from TDM as part of their regular operations, or seek out specialised tools designed to perform TDM.

Below we explore the extent of these activities across the economy from general AI use to the use of specialised TDM tools.

Text and Data Mining Through General AI Use:

Across the economy AI use is widespread. However, the intensity and levels of integration of use varies significantly. Most businesses have used AI, but far fewer have integrated it into their business functions in advanced ways.

For micro businesses over a third do not use AI at all, while 38% are only using AI ad hoc or have basic integration of the technology. For small businesses 53% see ad hoc or basic use, while only among large businesses do over half say they have advanced integration of the technology.

Current AI use by business size

However, businesses of all sizes are planning to increase their investment in AI. Medium-sized and large businesses are the most motivated, with 81% and 79% saying that investing in AI is either their top or one of their top priorities.

Plans to invest in AI by company size

Over the course of the next 2-3 years UK businesses want to advance their AI and cloud computing integration, moving from ad hoc and basic integration to advanced integration.

58% of businesses want to move to advanced integration of cloud computing technology and 56% want to move to advanced AI integration.

How does your business currently use this and how would you want it to use this in 2-3 years’ time

When it comes to making AI investments, while there is enthusiasm, AI investments are seen as riskier than other forms of business investment.

How safe or risky is it to invest in each of these areas

This means that businesses could be more easily put off if conditions are not right. Legal and regulatory risk were raised by 28% of businesses as likely to hold back investment in AI.⁴

Text and Data Mining Through Specialised Tools:

As well as using TDM through off-the-shelf AI products, many UK businesses are using more specialised tools to perform TDM.

In our survey we defined TDM as “using software to automatically find patterns, trends, or insights in large sets of text or other data.”

We asked our sample if they performed text and data mining and then further broke this down by businesses who reported using specialised tools to help them perform TDM. This has helped us identify the most advanced TDM users and what they use it for.

Almost 1 in 5 (19%) of UK businesses perform TDM using specialised tools. That is equivalent to over 1 million UK businesses using specialised TDM tools. For context that is more than the total number of businesses in London (983,000) .

As a business grows it is more likely to perform text and data mining with specialised tools: 13% of small businesses say they do this compared to 26% of medium sized businesses and 35% of large businesses.

TDM is highest among information and communication services, life sciences and financial services

37% information and communication services
34% life sciences
33% financial services

This gives us a good picture of current TDM use and given that in our wider survey over half of businesses said they wanted to move to advanced AI and cloud integration over the next 2-3 years, we can expect this number to grow.

Businesses use a blend of data when it comes to text and data mining. This includes internal, public information and other sources.

Data sources used by businesses using TDM

Among businesses who say they use internal or external data sources to use TDM, the overwhelming majority say that this is important, if not essential to their business. External data is seen as more important than internal data.

How important are different kinds of data to your business' ability to perform TDM.

Businesses who use TDM are much more likely to perform data analysis to support a wide range of activities, giving them greater insight than other businesses.

Types of data analysis tasks done

Case study: How Jaguar Land Rover is using AI to Redefine Modern Luxury

Generative AI is increasingly being adopted in the automotive sector to improve customer experience and operational efficiency. Jaguar Land Rover (JLR) has integrated AI into several areas of its business. For example, AI-driven interfaces are being used to personalise the customer journey, enabling vehicles to respond to driver preferences through natural language systems.

AI is also being applied to product development and marketing. Partnerships with technology firms have allowed JLR to use advanced simulation platforms to create digital twins and photorealistic models, reducing reliance on physical prototypes and shortening development cycles. These tools support predictive maintenance and the introduction of software-defined features, which are becoming a key differentiator for premium brands.

Internally, AI is helping to automate processes such as finance, procurement, and legal workflows. The company has implemented a structured approach to assessing AI use cases, focusing on those with clear commercial impact. This reflects a broader trend toward digital maturity in the automotive industry, where AI is seen as a driver of innovation and competitiveness.

TDM users are also growth minded; they are more likely to want to invest across their business and more highly prioritise AI investments than other businesses.

Top investment priorities

Onboarding/ increasing our use of AI tools is a top investment priority:

Businesses who do TDM:

0 %

Other businesses:

0 %

Onboarding/ Increasing our use of AI tools is crucial to my business' ability to compete:

Businesses who do TDM:

0 %

Other businesses:

0 %

Case study: AI in Financial Services – Trade Ledger

Trade Ledger uses AI to make business lending faster, easier, and more accurate. Its platform analyses huge amounts of financial data to help banks judge risk better and approve loans more quickly.

By automating checks and cutting manual work, the technology speeds up credit decisions and reduces delays for customers. This means lenders can offer finance to more businesses, helping firms access the capital they need to grow. Smarter lending doesn’t just improve efficiency—it unlocks new opportunities for expansion and investment across the economy.

This reflects a wider trend in financial services: using AI to streamline processes, improve customer experience, and drive growth in a competitive market.

0 %

of TDM-using UK businesses foresee a positive impact on their competitiveness if the UK can keep pace with US AI innovation.

We asked our TDM users to consider a range of scenarios when it comes to access to future AI technology. Businesses were shown a range of scenarios from an optimistic scenario where the UK keeps pace with the US and leading markets for new AI product availability, data availability and performance, to less optimistic scenarios where the UK only keeps pace with the EU or falls behind the EU.

Against each scenario businesses were asked to say where they expected each one to have a positive or negative effect on their ability to compete.

In all scenarios there was some optimism about the prospect of new AI technology becoming available. However, only in the case of keeping pace with the US did businesses see a clear opportunity.

Economic |

Our analysis looks at the economic effect that different text and data mining policy options could have on UK AI adoption over the next decade.

The model considers a range of issues likely to be affected by the different TDM scenarios, such as availability of new and updated products, for which use cases TDM and fine-tuning are more likely to be important for augmentation, and to what extent access to external data is needed.

Using this analysis, we can predict the likely contribution AI technologies will make to annual GDP by 2035 and test what effects each of our scenarios will have.

Impacts on AI adoption in each of the four scenarios

Scenario 1:

A TDM exemption exists for commercial use without the creation of an opt-out or rights reservation

Expected economic effect

UK businesses would gain access to the latest AI models at the same rate as leading markets such as the US.
Businesses would have the greatest opportunity to build domestic models. UK companies would also have the greatest opportunity to fine-tune existing models on UK data.
More generally, UK companies would be able to engage in the type of modern data analysis (whether using ML or AI applications) performed by their peer companies in certain countries (US, Japan, etc).
Available AI and TDM products would have maximum access to data for commercial purposes, supporting the development of new products and innovations.

Scenario 2:

A commercial TDM exemption with an opt-out based on an industry code of practice

Policy environment:

UK businesses would gain access to the latest AI models at the same rate as leading markets such as the US.
Businesses would have a globally competitive environment to build domestic models. UK companies would also have a conducive environment to fine tune existing models on UK data.
Available AI and TDM products would have significant access to data for commercial purposes, supporting the development of new products and innovations.
However, there may exist some barriers to access to diverse data compared with Scenario 1 because of the creation of a widely recognised opt-out.

Scenario 3:

A TDM exemption with an opt-out and transparency requirements

Expected economic effect

UK businesses would gain access to the latest AI models at the same rate as businesses in the EU.
Businesses would have a competitive environment to train new domestic models as businesses in the EU. But less competitive than in the US, Japan and other jurisdictions.
UK business would need to navigate opt-outs when fine tuning existing models for their particular use cases. Compliance with an opt-out framework under the law may be brittle and not keep pace with the development of new technologies, thus presenting legal uncertainty about how compliance would take place in the future.
Available AI and TDM products would need to navigate opt-outs in ways that would limit access to data for commercial purposes. In turn, this may hinder the development of new products and innovations versus previous scenarios.
Transparency requirements that risk exposing confidential information and trade secrets could delay the availability of certain AI models in the UK market, reducing timely access compared to other jurisdictions.

Scenario 4:

An opt-in model requiring licences for all copyrighted content

Policy environment:

UK businesses would gain access to the latest AI models at a delayed rate slower than the EU and other countries.
UK firms would find it economically unviable to build domestic models involving the use of external data in the UK, especially large foundational models.
There would be significant restrictions on access to data for fine tuning existing models given the cost to use data and an uncertain legal environment.
Available AI and TDM products would have almost no access to data for commercial purposes, unless access had been negotiated in advance, such as via a licence. This would hinder the development of new products and innovations via TDM and AI use, except for all but the largest and best resourced firms.
Routine inference tasks performed by off the shelf AI models are also affected, reducing the performance of off the shelf models in the UK when compared with other jurisdictions.

The model predicts that by 2035 AI adoption has the potential to contribute up to £510bn to UK GDP in the most optimistic scenario.

AI’s contribution to GDP in 2035, in billions of £ for each scenario

As access to data decreases, model availability is delayed and greater barriers to AI adoption occur when you move from scenario one to scenario four, the potential contribution of AI to the UK economy decreases, due to associated decreased access to data, delayed model availability and greater barriers to AI adoption in general.

Overall, we predict that the UK could lose up to £220bn in annual GDP contributions from AI technologies between our most optimistic scenario versus our most pessimistic scenario.

To put this in context, were the UK to benchmark itself against the EU (scenario 3) £60bn in GDP would be lost, equivalent to UK defence spending in 2025/26.⁵

In scenario four the UK loses £220bn that is the same size as Scotland’s GDP, including oil and gas revenues.

Change in AI’s contribution to UK GDP between scenarios 1 and 4

Scenario 1:

A commercial TDM exemption is introduced without an accompanying rights reservation or opt-out

AI’s contribution to GDP by 2035:

£ 0 bn

% change in contribution versus previous scenario:

N/A

Overall change in contribution from Scenario 1 to 4

- 0 %

-£ 0 bn

Scenario 2:

A commercial TDM exemption is introduced alongside the creation of an opt-out based on an industry code of practice

AI’s contribution to GDP by 2035:

£ 0 bn

% change in contribution versus previous scenario:

- 0 %

Scenario 3:

A Commercial TDM exemption with an opt-out and transparency requirements

AI’s contribution to GDP by 2035:

£ 0 bn

% change in contribution versus previous scenario:

- 0 %

Scenario 4:

An opt-in model requiring licences for all copyrighted content

AI’s contribution to GDP by 2035:

£ 0 bn

% change in contribution versus previous scenario:

- 0 %

Looking across the economy, we see these effects broken down by sector. Sectors with high potential for AI augmentation—particularly those that rely on custom fine-tuning, UK-specific proprietary models, or rapid, low-latency TDM—see the greatest impact as you move from scenario 1 to 4.

In areas like health and education, for example, the high levels of sensitive data are likely to require more control and auditability than off-the-shelf tools can give, requiring newly trained models. By contrast, low-latency applications are likely to be more critical in sectors such as financial services or manufacturing.

In all scenarios AI will continue to contribute to economic growth, through general AI product availability. However, our modelling suggests a quarter of the potential benefit (25%) could be lost without a supportive policy environment. The UK is likely to lose further growth benefits by becoming a less competitive destination for companies to train and fine-tune new models, as shown in CCIA’s survey.

Our modelling does not take account of other forms of growth that might stem from each of the scenarios (for example we would expect rights holders to see a gradual increase in licensing revenues, as you move from scenario 1 to scenario 4).

However, the size of the impact on GDP, as well as the distributed economic loss across business sectors is unlikely to be compensated by other economic effects.

Loss in AI's GDP contribution in 2035 between S1 and S4 broken down by sector (£ bn)

Life |

To understand the role of text and data mining (TDM) in one of the UK’s most strategically important sectors, Public First conducted a dedicated survey of 250 life sciences businesses alongside economic modelling of four policy scenarios.

This deep dive reveals that a third of UK life sciences firms already use specialised TDM tools, applying them to accelerate clinical trials, improve regulatory submissions, and extract insights from complex datasets. These TDM users are significantly more likely to prioritise AI investment and foresee competitive gains if the UK keeps pace with global innovation.

0 %

A third of UK life sciences firms conduct TDM using specialised tools

Modelling the economic impact on life sciences is often challenging due to the inclusion of diverse activities such as drug discovery, regulatory compliance and clinical trials. The closest UK industrial classification is Professional Scientific and Technical activities.

Within this classification, AI is projected to generate £56 billion in GDP growth by 2035 under scenario 1.

However, this figure drops to £30 billion under scenario 4, representing a loss of 46% of the potential value.

When a more detailed analysis is undertaken specifically for life sciences occupations such as biological scientists, biochemists and biomedical scientists, laboratory technicians and pharmaceutical technicians, our modelling indicates that these professions enabled by AI could contribute £9.1 billion annually to UK GDP by 2035.

Under scenario 4 up to £4.4 billion, or 49% of this benefit, could be put at risk.

Life sciences use cases for text and data mining (TDM), AI, or data analytics

A wide range of both internal and external data sources are used in life sciences TDM, including a significant amount of public research, websites and forums as well as paid for access content.

Data sources used by Life Sciences Businesses

case study: AI in Cancer Care – Accelerating Diagnosis and Drug Discovery

The UK Government is backing a new wave of AI-driven projects to transform cancer care and speed up drug development. These initiatives aim to cut diagnosis times, improve treatment accuracy, and reduce the cost of research—areas where text and data mining (TDM) plays a critical role.

Cancer research generates vast volumes of data: clinical notes, imaging scans, genomic sequences, trial registries, and regulatory submissions. TDM enables researchers to process this information quickly, identify patterns, and extract insights that would be impossible to achieve manually. By combining internal datasets with external sources—such as medical journals, preprint repositories, and live news coverage—AI systems can detect emerging biomarkers, flag safety signals, and optimise trial design in real time.

This approach is already delivering results. UK-backed projects are using AI to analyse millions of patient records and research papers to identify new drug targets and personalise treatment pathways. For example, models trained on genomic and imaging data can predict tumour behaviour, while mining global literature helps researchers spot promising compounds earlier in the development cycle.

These initiatives show how better access to diverse data, supported by TDM, can accelerate innovation in life sciences—reducing trial start-up times, improving patient outcomes, and strengthening the UK’s position as a leader in cancer research.

Life sciences companies face a number of barriers to increasing their use of TDM. The most significant of these include: access to skills, unclear regulatory guidance, and poor model explainability.

What would do the most to increase your company's use of TDM?

Life sciences companies are also eager to innovate.

Over the next 2-3 years 65% want to move to advanced integration of cloud and AI technologies.

How does your Life Science business currently use this and how would you want to use this in 2-3 years time?

0 %

of UK life sciences businesses using TDM foresee a positive impact on their competitiveness if the UK can keep pace with US AI innovation.

We asked advanced TDM users in life sciences to consider three scenarios for future AI access. The optimistic scenario—where UK companies gain access to AI products and services at the same time as leading markets like the USA—was viewed most favourably, with 81% citing a positive impact, while just 6% said they foresaw a negative impact.

Negative sentiment rose sharply in the less optimistic scenarios, with 34% predicting negative impacts if innovation only kept pace with the EU, while 40% saw a negative impact from falling behind both the US and the EU.

Impact on Economic Potential from Core Life Sciences Occupations

Financial |

Financial services firms are at the forefront of using text and data mining (TDM) to strengthen operational resilience and regulatory compliance. Drawing on a dedicated survey of 264 UK financial services businesses and economic modelling across four policy scenarios, Public First’s research shows that one in three firms already deploy specialised TDM tools—particularly for fraud detection, transaction monitoring, and market surveillance.

Economic modelling conducted alongside the survey shows that policy choices around TDM could significantly affect the sector’s future contribution to UK GDP.

In the most optimistic scenario, AI adoption in financial services could add up to £49 billion annually by 2035. However, under more restrictive regimes, up to £23 billion—or 47%—of that benefit could be lost.

0 %

A third of UK financial services companies conduct TDM using specialised tools

Market surveillance, transaction monitoring, fraud detection and prevention service, claims processing across financial services companies.

Financial Services use cases for text and data mining (TDM), AI, or data analytics

Financial services companies use a wide range of both external and internal data for TDM:

Main data source used for analysis

There are a wide range of barriers for TDM and AI use in financial services, including restrictions on data access and use, and regulatory and assurance concerns.

All FS businesses

FS businesses using TDM

Improving access to data through higher-quality labelling, easier access to a wider range of sources, and lowering the cost were some of the factors seen by financial services companies as likely to increase their use of TDM and AI over the next 2–3 years.

What would increase your use of text and data mining and AI over the next 2-3 years?

case study:
The Future of Fraud – Why Advanced Data Analysis Will Be Critical
Stop Scams UK and PWC

Fraud is evolving fast, and the financial services sector faces a future where traditional controls will no longer be enough. Since the pandemic, online fraud has proliferated significantly, exploiting the surge in digital transactions and remote interactions. This shift has created more complex and fragmented data trails, making pattern and trend spotting essential to effective prevention.

To counter this, firms must move beyond siloed systems and embrace advanced text and data mining techniques. That means integrating transactional ledgers, Know Your Customer documentation, customer communications, and market feeds with external datasets such as adverse media, sanctions lists, and open economic data. Crucially, this includes public and unlicensed sources, which often provide the earliest signals of emerging threats that internal systems alone cannot detect.

Machine learning and AI will play a central role, enabling real-time anomaly detection and predictive modelling to identify suspicious behaviours before they escalate. These tools thrive on diverse, high-quality data: without it, even the most advanced models cannot learn or adapt. With it, firms can automate checks, reduce false positives, and strengthen compliance.

Stop Scams and PWC’s report predicts that the next decade will see this become the norm. As fraudsters exploit digital channels and deploy increasingly sophisticated tactics, advanced analytics will be essential to protect customers and maintain trust in an ever more digital economy. Within this text and data mining will be foundation for fraud prevention.

0 %

of UK financial services businesses using TDM foresee a positive impact on their competitiveness if the UK can keep pace with US AI innovation.

We asked TDM users in financial services to consider three scenarios for future AI access. The optimistic scenario—where UK companies gain access to AI products and services at the same time as leading markets like the USA—was viewed most favourably, with 77% citing a positive impact, while just 10% said they foresaw a negative impact. Negative sentiment rose sharply in the less optimistic scenarios where access to AI technology trailed the US and EU.

By 2035 AI has the potential to drive £49 billion of the financial services and insurance activities annual contribution to GDP. However up to £23 billion or 47% of this benefit could be put at risk.

Impact on economic potential from financial and insurance activities

|

Our research shows that TDM use is present in the UK economy, growing and a key tool for UK businesses. Companies using TDM show a greater focus and ambition for growth than non-TDM users. TDM use is also more prevalent within strategic sectors of the UK economy, like life sciences and financial services. TDM users also see AI as critically important to their ability to compete.

Among the UK business population AI use is common, but this usage is largely ad hoc and skin-deep across many companies. This, however, will change, with over half of UK businesses seeing the next 2-3 years as a major inflection point for their use of AI and cloud computing.

As these companies seek more advanced use cases for AI and cloud technologies we would expect TDM use to grow. Access to a wide range of data sources will be critical for new TDM users, as well as clarity on the rules. Almost half of current TDM companies worry about legal risks, holding them back.

Our modelling shows that a commercial TDM exemption could underpin substantial rewards, but when the use of TDM for commercial purposes is not permitted the economic benefit drops significantly. £220bn of GDP could be lost, an economic hit comparable to the size of Scotland’s entire economy in 2024.

This negative impact comes from restricted access to data for AI workflows as well as reduced access to the latest technologies, which TDM users also see as curtailing their competitiveness.

Whether the UK should introduce a TDM exemption for commercial purposes has become a significant point of debate. However not doing so would be a major competitive risk, particularly as countries in Europe and Asia have already moved to clarify their position.

While there are many facets to this policy decision, our research shows that a commercial TDM exemption is in the best interests of the whole of the UK economy and is vital to the strategic industries that drive growth and underpin the UK’s Industrial Strategy.

To maximise the potential economic growth and competitiveness that AI and TDM can deliver, Public First recommends the UK Government should:

1.

Adopt a pro-data availability approach to commercial text and data mining, with policies that align with scenario 1 or 2 in our economic model, using Japan and Singapore as examples of best practice.

2.

Introduce enabling regulation quickly, particularly as the next 2-3 years will see companies across the economy move from basic to advanced AI and cloud integration.

3.

Champion pro-data availability policies, such as supporting experimentation through tried and tested policies like regulatory sandboxes and encouraging regulators to have a pro-growth mindset.

Text & Data Mining |

Commissioned by

Explore this Report

Executive Summary

Text & Data Mining (TDM)

UK businesses using specialised TDM tools

Businesses engaged in TDM

of TDM users say AI is critical to their ability to compete

TDM users rely on a blend of

of businesses performing TDM said access to external data was essential for their business

The next 2-3 years will be critical

of businesses want to move from basic to advanced integration of AI and cloud technologies

British businesses want access

OF BUSINESSES USING TDM SAID THEY EXPECT A POSITIVE IMPACT ON THEIR COMPETITIVENESS IF THE UK KEEPS PACE WITH THE USA

The economic impact of the different choices

Four plausible policy scenarios were modelled:

Scenario 1:

Scenario 2:

Scenario 3:

Scenario 4:

Our model predicts that by 2035 AI adoption could be contributing

Public First’s economic modelling

Total economy

By business sector:

Professional Scientific & technical activities

Education

Human health and social work

Financial and Insurance activities

Manufacturing

To maximise the potential economic growth and competitiveness that AI and TDM can deliver, Public First recommends the UK Government should:

1.

2.

3.

What is Text and Data Mining |

Introduction: the UK’s AI opportunity

5:1 return on investment

How does Text and Data Mining relate to modern AI models?

Case study: AI in Education – Northern Ireland’s Education Authority

Text and Data Mining also Goes Beyond AI

Case study: Advanced Cloud and AI Tools in Financial Services – London Stock Exchange Group

What does the law say

The UK’s Legal Framework on Copyright and TDM

What is Happening in the Rest of the World?

Jurisdiction

Default Copyright Position

TDM Framework

Implications for AI

Jurisdiction

Default Copyright Position

TDM Framework

Implications for AI

Jurisdiction

Default Copyright Position

TDM Framework

Implications for AI

Jurisdiction

Default Copyright Position

TDM Framework

Implications for AI

Jurisdiction

Default Copyright Position

TDM Framework

Implications for AI

What Options Does the UK Have?

Four scenarios that could play out in the UK for Text and Data Mining Policy

Scenario 1:

A commercial TDM exemption is introduced without an accompanying rights reservation or opt-out.

Policy environment:

Effects on UK businesses access to AI and use of TDM:

Scenario 2:

A commercial TDM exemption is introduced alongside the creation of an opt-out based on an industry code of practice

Policy environment:

Effects on UK businesses access to AI and use of TDM:

Scenario 3:

A TDM exemption with an opt-out and transparency requirements

Policy environment:

Effects on UK businesses access to AI and use of TDM:

Scenario 4:

An opt-in model requiring licences for all copyrighted content

Policy environment:

Text & Data Mining
|

What is Text and Data Mining
|

How do UK businesses
|