One Million Hours of YouTube Videos Captured for GPT-4 Training by OpenAI
How was Chat GPT-4 trained by Open AI?
The narrative starts with OpenAI, which apparently developed its Whisper audio transcription algorithm in desperation for training data and overcame the obstacle by transcribing more than a million hours of YouTube videos to train GPT-4, its most potent large language model.
The New York Times claims that although the company was aware of the legal concerns, it still thought it was a fair use. According to The Times, Greg Brockman, the head of OpenAI, personally gathered the videos that were utilized.
How did the business move forward?
OpenAI spokesperson Lindsay Held told The Verge via email that the company selects “unique” datasets for every model in order to “help their understanding of the world” and keep its research competitive globally.
Held stated that the business is thinking about producing its own synthetic data and that it uses “many sources, including publicly available data and partnerships for non-public data.”
Why did the business choose to use footage from YouTube?
The Times article claims that in 2021, the company ran out of useful data and, having exhausted all other avenues, thought about transcribing podcasts, audiobooks, and YouTube videos.
By then, it had used data from Quizlet homework assignments, chess move databases, and GitHub computer code to train its models.
What are your thoughts on this? Tell us in the comments below.
Follow us on Instagram for more popular stories.
By MunafekiDeal