Apple reportedly used videos without permission from late night hosts and others to train AI (Apple replies)
UPDATE: Apple denies using data obtained without permission from content creators to train Apple Intelligence. However, the company did admit to using the YouTube subtitles to train its open-source OpenELM models released last April. Open ELM does not power Apple's Apple Intelligence AI initiative or any of its AI and Machine Learning features.
Artificial Intelligence platforms don't come out of the box ready to go. Like puppies, they have to be trained. This is done by "feeding" select data to the algorithms so that the system can deliver accurate answers. For example, we told you back in April that Apple was thinking about forking over $50 million to license content from media companies like NBC News, Condé Nast (publisher of Vogue and The New Yorker), and IAC (publisher of People, Better Homes and Gardens, and The Daily Beast) for AI training.
Today, word has come out that Apple and other companies have used content from YouTube videos to train AI models without the permission of the creators of these videos. According to this new report, a third party created a file of sub-titles taken from over 170,000 videos. These videos include content from long-time tech reviewer Marquees Brownlee (MKBHD), and late-night comics Stephen Colbert and Jimmy Kimmel.
As reported by WIRED, subtitles from 173,536 YouTube videos were used by Silicon Valley firms including Anthropic, Nvidia, Apple, and Salesforce. The downloads were supposedly done by a firm named EleutherAI that helps developers train AI models. The goal, according to the report, was to create training materials for small developers and academics.
"Technology companies have run roughshod. People are concerned about the fact that they didn’t have a choice in the matter,” Keller said. “I think that’s what’s really problematic."-Amy Keller, partner at the law firm DiCello Levitt
However, large companies like Apple were using this dataset created by EleutherAI called YouTube Subtitles which doesn't include imagery but does feature plain text of videos’ subtitles. The latter also includes translations into languages such as Japanese, German, and Arabic. YouTube Subtitles contains content from over 12,000 videos some of which have been deleted from YouTube. One unnamed creator deleted all of his videos that were online and discovered that his work was still included in some AI models.
The problem is that none of the YouTube creators had been asked for their permission to allow the videos they made to be used to train AI models. While there have been lawsuits against members of the AI community for using content without permission, companies like Open AI and Meta have defended their actions by saying that their actions were supported by the Fair Use doctrine which allows the unlicensed use of copyrighted material in certain situations.
Things that are NOT allowed: