Stone Junction Ltd

Intellectual property rights versus AI training needs

Author : Zohar Kantor, QualiSense

28 August 2024

Access to copyrighted material will be essential to developing effective AI systems for manufacturing, argues Zohar Kantor, chief revenue and customer success officer at quality inspection specialist QualiSense. 

OpenAI, the company behind ChatGPT, has been hit by a series of lawsuits for having trained its models on materials available on the internet. The case will be a major legal landmark in how copyright law is applied to AI training. In this article, Kantor explains how QualiSense has managed to access copyrighted production data, and why the AI systems they have developed would not be effective without it. 

In April 2024, a group of eight US newspapers, including The New York Daily News, Chicago Tribune and Denver Post, sued OpenAI and Microsoft for allegedly using their copyrighted articles without permission to train their AI models. The New York Times had already sued both OpenAI and Microsoft in December, on similar grounds.

These legal challenges reflect broader issues in the field of generative AI, particularly concerning the ethical sourcing of training data and the accuracy and reliability of AI-generated content. But, when training an AI model, there is no escaping the need for vast quantities of data. For an application like ChatGPT, which is designed to provide information about any topic, the data required is extraordinarily vast. Putting aside the legal rights and wrongs of this case, you cannot train a model without this data.

In addition to the quantity of data, the relevancy of that data is also crucial. If you want to build an application for a manufacturing environment, for example, you need manufacturing data. Unlike the data that is freely available on the internet, the relevant data here is closely guarded by manufacturing companies. They have a dilemma: they want to unlock the power of AI, but they won’t easily give away their data.

Model training for defect detection
If you want to build an AI model for a specific use case, for example quality inspection, you need data that is highly relevant to that specific use case. However, to achieve the end goal of a deployable model, there are different routes. If you are starting from zero every time, the process of building the model for your production line will take much longer, require more images, and necessitate greater input from the quality manager.

The alternative route, which achieves the same outcome, but in less time and with less hassle, is to develop an AI backbone, essentially pre-training a model with relevant data. In the same way that a human being would recognise a new car they had never seen before as belonging to the category of “car” based on their prior knowledge of cars, so too an AI model, trained on vast quantities of relevant manufacturing data, can recognise a “crack” or a “watermark” on a metal surface, based on its pre-training data.

This will not get you to the end goal, but it will give you a significant head start. Whereas ChatGPT might be able to make mistakes, the KPIs for defect detection typically allow an error rate close to zero. It’s a different game, with a much higher standard. The only way you can achieve a deployable model that will meet this KPI is to tailor its training to the specific use case, by giving it data from your production line and feedback from the quality manager. 

However, if the pre-training data is voluminous and relevant, you have a powerful backbone in place. With this backbone, you have already completed half the job of training a fully deployable AI model. The problem with building this model is, as the case of ChatGPT shows, you need a lot of data.

At QualiSense, for example, our solution has been to partner with Johnson Electric. We’ve secured an agreement with them that provides access to vast troves of manufacturing data from eighteen manufacturers across the world. This means that our model has a powerful backbone allowing it to recognise different types of defects on metal surfaces. 

QualiSense is a fast-growing start-up developing AI software for manufacturing use-cases. Find out more at https://qualisense.ai/ .


Contact Details and Archive...

Print this page | E-mail this page


Stone Junction Ltd

This website uses cookies primarily for visitor analytics. Certain pages will ask you to fill in contact details to receive additional information. On these pages you have the option of having the site log your details for future visits. Indicating you want the site to remember your details will place a cookie on your device. To view our full cookie policy, please click here. You can also view it at any time by going to our Contact Us page.