TL;DR
Last week was a thrilling showdown between OpenAI and Google, each unveiling their latest multimodal AI advancements to the public. The highlight was OpenAI’s GPT-4o—an unscripted, live demo that captured attention for its transparency and real-world application, contrasting with Google’s Gemini 1.5 Pro presentation, which, despite its polish, was largely limited to developer previews.
This face-off not only underlined the increasing significance of “Multimodal” and “Tokens” in our AI vocabulary but also showcased a preference for authenticity and immediate availability in technology demonstrations. OpenAI’s approach resonated more with the audience, emphasizing genuine utility over numerical superiority. As multimodal AI continues to evolve, it’s set to redefine small business landscapes worldwide, making the future an exciting prospect.
The Battle for Multimodal AI Supremacy ai essential news
Last week both OpenAI and Google made big announcements about their multimodal AI capabilities at their respective events, Spring Update and Google I/O 2024. While both companies focused on making AI more accessible and human-like, I couldn’t help but feel that OpenAI came out on top. ai essential news
What really impressed me about OpenAI’s presentation was the live demo. Sure, it didn’t go exactly as planned, but that’s what made it so authentic. It was a real demonstration of their technology in action, not some overly polished, pre-recorded showcase. The fact that it went viral only proves that people appreciate transparency and genuineness.
On the other hand, Google’s update was bigger, brighter, and more polished—exactly what you’d expect from a tech giant like Google. However, when you look past the glitz and glamour, you realize that most of the announced features are only available as private previews for developers, not the general public. It’s a bit disappointing, especially when we’ve grown accustomed to announcements being followed by immediate availability, like in the good old days of Steve Jobs’ Apple keynotes.
The Token Wars
Now, let’s talk about tokens. Google’s Gemini 1.5 Pro boasts an impressive 2 million tokens, while OpenAI’s GPT-4o offers 128,000.
It’s a significant difference, and it reminds me of the early days of PCs and camera megapixels. Everyone was obsessed with the numbers, but eventually, they became less important. For now, tokens are still a relevant consideration when choosing an LLM, but I believe that the real value lies in how accessible and usable the technology is for end-users.
The Verdict
In my opinion, OpenAI won this round. They delivered a real, honest presentation that showcased their multimodal AI capabilities in action. While Google’s announcement was impressive, it felt more like a tease for developers rather than a game-changer for small businesses like mine.
As we move forward, I’m excited to see how these advancements in multimodal AI will shape the future of small businesses in Australia and beyond.
The Facts
For those who want the detail, here is a comparison table of the key announcements from OpenAI’s Spring Update event on May 13, 2024, and Google’s I/O developer conference on
May 14, 2024:
Category | OpenAI Spring Update | Google I/O 2024 |
Flagship AI Model | GPT-4o: Combines GPT-4 level intelligence with improved speed and expanded multimodal capabilities across text, vision, and audio | Gemini 1.5 Pro: State-of-the-art multimodal model with up to 1 million token context window in a limited preview |
Lightweight / Optimized Models | No major announcement | Gemini 1.5 Flash: Lighter-weight model optimized for high-frequency tasks where low latency and cost matter most |
Availability | GPT-4o is being rolled out to all users, including ChatGPT free tier, over the next few weeks | Gemini 1.5 Pro and Flash available in private preview via Google AI Studio and Vertex AI |
Cost | 50% cheaper in the API compared to GPT-4 |
|
Developer Tools | New Assistants API for building AI apps with goals that can call models and tools | JSON mode for structured data extraction, improved function calling in Gemini API |
Image Generation | DALL·E 3 API with enhanced image quality and more aspect ratio options | No major update to Imagen announced |
Video Generation | No update on Sora video model | Veo – generative AI tool that can create 1080p videos from text prompts |
Speech Recognition | Whisper large-v2 model available via API at $0.006/minute | Native speech recognition capability in Gemini 1.5 Pro |
Search & Knowledge | No announcement of rumoured search engine | AI Overviews in Search to summarize complex topics, Ask Photos feature in Google Photos |
Other Announcements |
|
|