Google admits to editing the video of its new AI model
Google is under fire from AI experts for releasing a misleading video on Wednesday that claims to demonstrate its new Gemini AI model. The video shows the model recognizing objects and talking to a human in real time. However, as Parmy Olson reported for Bloomberg, Google has confessed that the video was fabricated. The researchers used still images and text to prompt the model and selected the best responses, creating a false impression of the model’s abilities.
A Google spokesperson explained to Olson that “We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges. Then we prompted Gemini using still image frames from the footage, & prompting via text.” Olson noted that Google recorded a pair of human hands performing various tasks, then fed still images to Gemini Ultra, one at a time. The researchers communicated with the model through text, not voice, and then stitched together the best interactions with voice synthesis to make the video.
The video was suspicious to AI experts because processing still images and text with large language models is very costly, which makes real-time video analysis unrealistic. This was one of the signs that alerted the experts that the video was not genuine.
Olson wrote in a tweet that “Google’s video made it look like you could show different things to Gemini Ultra in real time and talk to it. You can’t.” A Google spokesperson said that “the user’s voiceover is all real excerpts from the actual prompts used to produce the Gemini output that follows.”
Google tries to catch up with OpenAI
Google has been lagging behind OpenAI, a rival AI company, in the field of generative AI technology. Some of the innovations that OpenAI has developed originated from Google’s own research labs. Google has been trying to close the gap since early this year, investing heavily in Bard, a competitor to OpenAI’s ChatGPT, and PaLM 2, a large language model. Google presented Gemini as the first serious challenger to OpenAI’s GPT-4, which is still regarded as the leader in large language models.
Google’s plan seemed to work at first. After announcing Google Gemini on Wednesday, the company’s stock rose by 5 percent. But soon, AI experts started to question Google’s exaggerated claims of “sophisticated reasoning capabilities,” including benchmarks that might not be meaningful, and eventually focused on the Gemini video with manipulated results.
The video, titled “Hands-on with Gemini: Interacting with multimodal AI,” shows what the model supposedly sees, along with the model’s responses on the right side of the screen. The researcher draws squiggly lines and ducks and asks Gemini what it can see. The viewer hears a voice, presumably of Gemini Ultra, answering the questions.
Olson pointed out in her Bloomberg article that the video also does not mention that the recognition demo probably uses Gemini Ultra, which is not available yet. “Fudging such details points to the broader marketing effort here: Google wants us [to] remember that it’s got one of the largest teams of AI researchers in the world and access to more data than anyone else,” Olson wrote.
Gemini’s image recognition skills are impressive on their own, and if they were portrayed more honestly (as they are on this Google blog page), they would still be commendable. They appear to be comparable to the abilities of OpenAI’s multimodal GPT-4V (GPT-4 with vision) AI model, which can also identify the content of still images. But when they were edited together smoothly for marketing purposes, they made Google’s Gemini model look more capable than it is, and that fooled many people.
TED organizer falls for Google’s Gemini AI video
One of the people who was fooled by Google’s Gemini AI video was Chris Anderson, the organizer of TED, a popular conference series. He expressed his excitement about Google’s Gemini AI model on Thursday. He tweeted that he was amazed by the implications of the demo video that Google released on Wednesday. The video showed the model recognizing objects and talking to a human in real time.
Anderson speculated that Gemini 2.0, a possible future version of the model, could participate in a board meeting and make intelligent contributions. He asked his followers if that would qualify as artificial general intelligence (AGI), a term that refers to AI systems that can perform any intellectual task that humans can.
However, Anderson’s tweet was met with skepticism and criticism from some AI experts. Grady Booch, a renowned software engineer and chief scientist of software engineering at IBM, replied to Anderson’s tweet and revealed that the video was not real. He said that the video was heavily edited to make Gemini look more capable than it is. He accused Google of deceiving Anderson and the public. He also expressed his disappointment and disapproval of Google’s unethical behavior.