MONOLITH LAW OFFICE+81-3-6262-3248Weekdays 10:00-18:00 JST

MONOLITH LAW MAGAZINE

IT

Is Crawling Images on the Internet a Violation of Copyright Law? Explaining the Legal Issues of Machine Learning

IT

Is Crawling Images on the Internet a Violation of Copyright Law? Explaining the Legal Issues of Machine Learning

In recent years, the advancement of AI (Artificial Intelligence) technology has been remarkable, and various AIs such as image generation AIs like ‘Stable Diffusion’ and ‘Midjourney’, and text generation AIs like ‘ChatGPT’ are attracting attention. By crawling data on the internet and allowing AI to learn from it, various things have become possible with the use of AI. While the accuracy of machine learning is improving, there are concerns about the risk of copyright infringement.

Is it not a violation of copyright to crawl various data such as images and illustrations published on the internet, collect them without permission, process them, and use them for AI machine learning?

In this article, we will explain the legal issues involved in using images and illustrations published on the internet for machine learning.

What is Machine Learning?

What is Machine Learning?

Machine Learning (ML) refers to the process by which machines learn from data, much like how humans learn from experience. In the process of machine learning, it is necessary to collect data, select and process this data, and create a dataset for learning.

Crawling refers to the process where a program called a crawler travels around websites, duplicating and saving information such as text and images found on web pages.

Related article: What is scraping? Explaining the legal issues of this popular data collection method

Copyright Issues in Machine Learning

“Copyright” is, in simple terms, the right to legally protect a work. The “work” that is protected is defined in Article 2, Paragraph 1 of the Japanese Copyright Law as follows:

(Definition)

Article 2 In this law, the meanings of the terms listed in the following items shall be as prescribed in each of those items.

1. Work: Something that creatively expresses thoughts or feelings and belongs to the realm of literature, academia, art, or music.

Background of the 2018 (Heisei 30) Copyright Law Amendment

In 2018 (Heisei 30), the amended Copyright Law was established and came into effect on January 1, 2019 (Heisei 31).

In order to utilize technologies such as IoT, Big Data, and AI (Artificial Intelligence), it is necessary to enable the accumulation, combination, and analysis of a large amount of information, including copyrighted works. Therefore, this amendment allows the use of copyrighted works in certain cases, such as when they are not used for the purpose of appreciation.

What is the permitted use under Article 30-4 of the Copyright Law?

Article 30-4 of the Copyright Law, amended in 2018, allows the use of copyrighted works to the extent necessary, regardless of the method, for “uses that do not aim to enjoy the thoughts or feelings expressed in the work”.

(Use not intended for the enjoyment of thoughts or feelings expressed in the work)

Article 30-4 A work may be used, regardless of the method, to the extent necessary in the following cases and other cases where the purpose is not to enjoy the thoughts or feelings expressed in the work oneself or to allow others to enjoy them. However, this does not apply if it would unfairly harm the interests of the copyright holder in light of the type and purpose of the work and the manner of its use.

1. When used for testing for the development or practical application of technology related to the recording, filming, or other use of a work

2. When used for information analysis (extracting information related to the language, sound, image, or other elements that make up the information from a large number of works or other large amounts of information, and conducting comparison, classification, or other analysis. The same applies in item 2 of paragraph 1 of Article 47-5.)

3. In addition to the cases listed in the preceding two items, when the work is used in the process of information processing by a computer or other use without human perception of the expression of the work (excluding the execution of the work of a program in a computer for a program work.)

Specifically, the use of copyrighted works is permitted in the following cases:

・Act of experimentally reproducing artworks for the development of cameras or printers suitable for reproducing artworks

・Act of collecting and using copyrighted works as learning data for the development of artificial intelligence, or providing (transferring or publicly transmitting, etc.) the collected learning data to third parties for the purpose of developing artificial intelligence

・Act of copying copyrighted works in the backend of computer information processing and using that data without any human perception

・Act of using the work of a program for the purpose of investigating and analyzing the program (so-called “reverse engineering”)

Source: Agency for Cultural Affairs|About the Law to Amend Part of the Copyright Law (Law No. 30 of Heisei 30)

Cases Where Using Copyrighted Works for Machine Learning Could Violate Copyright Law

Cases where using copyrighted works for machine learning could violate copyright law

As such, not only is it permissible to collect, process, and use images (copyrighted works) for machine learning, but it is also permissible under Article 30-4, Paragraph 2 of the Japanese Copyright Law to provide (sell, transfer, etc.) the collected training data to third parties. However, such use of copyrighted works could potentially lead to legal troubles.

Here, let’s consider the legal issues that could arise when using images published on the internet for machine learning.

Related article: How much of the information on the internet can be used? Explaining copyright on the internet

When it Unfairly Harms the Interests of the Copyright Holder

While Article 30-4 of the Japanese Copyright Law allows for “use not intended for the enjoyment of thoughts or feelings expressed in the work”, it does not permit the use of the work if it unfairly harms the interests of the copyright holder.

What specific cases can be considered? According to the Q&A of the Agency for Cultural Affairs, the following cases are considered to “unfairly harm the interests of the copyright holder”.

Although the specific judgment is ultimately made in court, for example, if a database of works that organizes a large amount of information in a form that can be easily used for information analysis is being sold, the act of reproducing the database for the purpose of information analysis is considered to “unfairly harm the interests of the copyright holder” as it conflicts with the market for selling the database.

Quote: Copyright Division, Agency for Cultural Affairs | “Basic Thoughts on Flexible Rights Limitation Provisions in Response to the Advancement of Digitalization and Networking” 

When an Agreement Different from the Provisions of Copyright Law is Made

While the Japanese Copyright Law allows for the use of copyrighted works for machine learning, it is also possible for the parties involved to make an agreement that differs from this. If such an agreement is established, there is a possibility of being pursued for damages, etc., if the agreement is violated.

For example, there are sites that explicitly prohibit the collection and extraction of data for machine learning and information analysis in their terms of use or license agreements. When collecting data, it is necessary to check the terms of use and license agreements of the site.

Generally, to “agree” to the terms of use of a site, some action is required. For example, you may be asked to register or press an agreement button along with a display such as “By creating an account, you are considered to have agreed to the terms of use and privacy policy”. By clicking on the registration or agreement button, an “agreement” is established.

On the other hand, if the terms of use that prohibit the collection and extraction of data are posted on a page separate from the download page of the site, and it is possible to download images without agreeing to them, it is considered that an “agreement” has not been established. In this case, the provisions of the Copyright Law apply, and you are allowed to use the copyrighted work.

However, to prevent trouble, it is better to refrain from collecting data from sites that explicitly prohibit the collection and extraction of data in their terms of use.

Related article: What is scraping? Explaining the legal issues of this popular data collection method

Does Image Synthesis by Machine Learning Violate Copyright Law?

Does Image Synthesis by Machine Learning Violate Copyright Law?

So far, we have explained that the use of copyrighted works for machine learning is recognized under copyright law. But does the creation of synthetic images by AI through machine learning infringe on the copyrights of the original images (photos, illustrations, paintings, etc.) used for learning?

Here, we will explain using the case of AI generating images through GAN (Generative Adversarial Networks).

How Image Generation by Machine Learning Works

GAN (Generative Adversarial Networks) is a type of generative model that can generate non-existent data or transform existing data according to its features by learning from the data. The mechanism of image generation by GAN, for example, is used in services that analyze actual photos or drawings of rooms and synthesize images as if furniture matching the budget and room size is actually placed there.

Can AI Infringe on the Copyright of the Original Images Used for Machine Learning?

GAN consists of two neural networks: a generator and a discriminator. The generator reads the features of the original image in numerical form and inputs certain variables to output adjusted numerical values and generate synthetic images.

In other words, the synthetic image is a newly generated image as a result of inputting variables into the function during the synthesis process, and can be said to be completely different from the original image data (photos, illustrations, paintings, etc.). Even if a similar image to the original is synthesized as a result of machine learning, it is considered not to be a reproduction, adaptation, or modification of the original learning data.

Therefore, it can be said that synthetic images generated by AI through machine learning do not infringe on the copyright of the original images used for learning.

Related article: How are Intellectual Property Rights Protected in AI Development? Organizing the Issues of Copyright and Patent Rights

Summary: Consult a Lawyer for Issues Regarding AI Machine Learning and Copyright

In this article, we have discussed the copyright issues associated with using images published on the internet for AI machine learning.

Using copyrighted works for machine learning is permitted under the Japanese Copyright Law (Article 30-4 of the Japanese Copyright Law). However, there are exceptions. For instance, if the use of the copyrighted work unfairly harms the interests of the copyright holder, or if the parties involved have agreed to terms that differ from the provisions of the copyright law, the use of the copyrighted work may not be permitted.

With the growing attention on AI such as “Midjourney”, “Stable Diffusion”, and “ChatGPT”, and the rapid increase in companies embarking on further AI development, it can sometimes be difficult to determine whether the use of copyrighted works as essential learning data for AI development is permitted. Therefore, if you are conducting a business that utilizes AI and machine learning, we recommend consulting with a lawyer who is knowledgeable in the IT field.

Introduction to Our Firm’s Measures

Monolith Law Office is a legal office with extensive experience in both IT, particularly the internet, and law.

AI businesses come with many legal risks, and support from lawyers well-versed in legal issues related to AI is indispensable. Our firm, with a team of lawyers and engineers who are proficient in AI, provides advanced legal support for AI businesses, including ChatGPT. This includes drafting contracts, examining the legality of business models, protecting intellectual property rights, and handling privacy issues. Details are provided in the article below.

Areas of expertise at Monolith Law Office: AI (ChatGPT, etc.) Legal Affairs

Managing Attorney: Toki Kawase

The Editor in Chief: Managing Attorney: Toki Kawase

An expert in IT-related legal affairs in Japan who established MONOLITH LAW OFFICE and serves as its managing attorney. Formerly an IT engineer, he has been involved in the management of IT companies. Served as legal counsel to more than 100 companies, ranging from top-tier organizations to seed-stage Startups.

Return to Top