Text and Data Mining
Text and Data Mining

The proposed mandatory exception would only permit text and data mining for scientific research purposes by research organisations. Innovators, journalists and everyone else would need to get permission from rightsholders, even when they have legal access to the materials in question. As a result, only a small privileged minority of researchers will be able to conduct text and data mining without restriction in the EU.

Text and Data Mining (TDM) is the automated process of selecting and analyzing large amounts of text or data resources for purposes such as searching, finding patterns, discovering relationships, semantic analysis and learning how content relates to ideas and needs in a way that can provide valuable information for studies and research.

There are opposing views as to whether the permission of rightsholders is needed to perform TDM on texts that are protected by copyright. Some say that mining is not different from reading and, thus, should be free and unrestricted. Others believe that TDM falls under the scope of protection of copyright. If TDM is a copyright-protected activity, then a mandatory EU-wide copyright exception needs to be in place to facilitate research and innovation across the EU. Such an exception should permit TDM by anyone, for any purpose, thus ensuring that “the right to read is the right to mine.”

The negotiators have agreed to leave the scope of the original exception largely unchanged. It allows TDM only “for the purposes of scientific research". In addition to research organisations, cultural heritage institutions have been added as beneficiaries. All other users will need to rely on a more limited additional exception, which would allow TDM only if rightsholders of the underlying works don’t object to it.

The result is both innovation- and user-hostile. Instead of a clear rule ("the right to read is the right to mine") anyone wanting to engage in TDM in the EU will need to navigate a complicated set of rules before being able to find out if she is entitled to do so, or if a license needs to be sought. This will likely deter many users from engaging in TDM.


Status Quo

Text and data mining is not directly addressed in the InfoSoc Directive. However, Art 5(1) provides a mandatory exception for “Temporary acts of reproduction,” and arguably this would include some types of acts relevant to the process of TDM. In addition, Art 5(3)(a) provides for an exception “for the sole purpose of illustration for teaching or scientific research,” but this exception is optional, thus not harmonised among all EU Member States.


The European Commission recognised that researchers encounter legal uncertainty about whether—and how—they may engage in text and data mining, and were concerned that publishers’ contractual agreements may exclude TDM activities. In addition, the Commission observed that the optional nature of existing exceptions could negatively impact the functioning of the internal market. The Commission introduced a mandatory exception for reproductions and extractions made by research organisations in order to carry out text and data mining of works to which they have lawful access for scientific research purposes. The proposal forbade contractual provisions that would act contrary to the exception.

Parliament and Council

The Council text expanded the beneficiaries of the exception to include cultural heritage institutions. It required that copies of works made in relation to TDM should “not be retained for longer than necessary for achieving the purposes of scientific research.” It also included an optional exception (3a) that would permit TDM by anyone for “temporary reproductions and extractions of lawfully accessible works and other subject-matter that form a part of the process of text and data mining” but only if such use “has not been expressly reserved by their rightholders including by technical means.”

The Parliament text expanded the beneficiaries to include cultural heritage institutions. It said that reproductions of works made for TDM “shall be stored in a secure manner, for example by trusted bodies appointed for this purpose.” It also included an optional exception (3a) that would permit TDM by anyone for “works and other subject-matter that form a part of the process of text and data mining, provided that the use of works and other subject matter.”

The final trilogue text makes 3a mandatory, but retains the provision that the exception “shall apply provided that the use of works and other subject matter referred to therein has not been expressly reserved by their rightsholders.”

License information

This site is hosted by Communia, the International Association On the Digital Public Domain. We release all our documents, reports, infographics and researches under the Creative Commons Public Domain Dedication (CC0). Unless otherwise stated, images are released under CC0 as well. Please feel free to download and reuse.

Follow us on: