top of page

Mid-May AI Essential News: Announcements from OpenAI and Google  

Updated: May 23

the battle for multimodal AI supremacy. One boxer represents OpenAI, wearing blue shorts with the OpenAI logo
Image generated by DALL-E, prompted by Janey Treleaven of Intelligence Assist

TL;DR

Last week was a thrilling showdown between OpenAI and Google, each unveiling their latest multimodal AI advancements to the public. The highlight was OpenAI's GPT-4o—an unscripted, live demo that captured attention for its transparency and real-world application, contrasting with Google's Gemini 1.5 Pro presentation, which, despite its polish, was largely limited to developer previews.


This face-off not only underlined the increasing significance of "Multimodal" and "Tokens" in our AI vocabulary but also showcased a preference for authenticity and immediate availability in technology demonstrations. OpenAI's approach resonated more with the audience, emphasizing genuine utility over numerical superiority. As multimodal AI continues to evolve, it's set to redefine small business landscapes worldwide, making the future an exciting prospect.

 

The Battle for Multimodal AI Supremacy ai essential news


Last week both OpenAI and Google made big announcements about their multimodal AI capabilities at their respective events, Spring Update and Google I/O 2024. While both companies focused on making AI more accessible and human-like, I couldn't help but feel that OpenAI came out on top. ai essential news


What really impressed me about OpenAI's presentation was the live demo. Sure, it didn't go exactly as planned, but that's what made it so authentic. It was a real demonstration of their technology in action, not some overly polished, pre-recorded showcase. The fact that it went viral only proves that people appreciate transparency and genuineness. 


On the other hand, Google's update was bigger, brighter, and more polished—exactly what you'd expect from a tech giant like Google. However, when you look past the glitz and glamour, you realize that most of the announced features are only available as private previews for developers, not the general public. It's a bit disappointing, especially when we've grown accustomed to announcements being followed by immediate availability, like in the good old days of Steve Jobs' Apple keynotes. 

infographic image of: "What is Multimodal AI?"


 

The Token Wars 


Now, let's talk about tokens. Google's Gemini 1.5 Pro boasts an impressive 2 million tokens, while OpenAI's GPT-4o offers 128,000.


It's a significant difference, and it reminds me of the early days of PCs and camera megapixels. Everyone was obsessed with the numbers, but eventually, they became less important. For now, tokens are still a relevant consideration when choosing an LLM, but I believe that the real value lies in how accessible and usable the technology is for end-users. 

infographic image of: "What are AI Tokens?"

 

The Verdict 


In my opinion, OpenAI won this round. They delivered a real, honest presentation that showcased their multimodal AI capabilities in action. While Google's announcement was impressive, it felt more like a tease for developers rather than a game-changer for small businesses like mine. 


As we move forward, I'm excited to see how these advancements in multimodal AI will shape the future of small businesses in Australia and beyond.  

 

The Facts 

For those who want the detail, here is a comparison table of the key announcements from OpenAI's Spring Update event on May 13, 2024, and Google's I/O developer conference on

May 14, 2024: 

Category 

OpenAI Spring Update 

Google I/O 2024 

Flagship AI Model 

GPT-4o: Combines GPT-4 level intelligence with improved speed and expanded multimodal capabilities across text, vision, and audio 

Gemini 1.5 Pro: State-of-the-art multimodal model with up to 1 million token context window in a limited preview 

Lightweight / Optimized Models 

No major announcement 

Gemini 1.5 Flash: Lighter-weight model optimized for high-frequency tasks where low latency and cost matter most 

Availability 

GPT-4o is being rolled out to all users, including ChatGPT free tier, over the next few weeks 

Gemini 1.5 Pro and Flash available in private preview via Google AI Studio and Vertex AI 

Cost 

50% cheaper in the API compared to GPT-4 

 

Developer Tools 

New Assistants API for building AI apps with goals that can call models and tools 

JSON mode for structured data extraction, improved function calling in Gemini API 

Image Generation 

DALL·E 3 API with enhanced image quality and more aspect ratio options 

No major update to Imagen announced 

Video Generation 

No update on Sora video model 

Veo - generative AI tool that can create 1080p videos from text prompts 

Speech Recognition 

Whisper large-v2 model available via API at $0.006/minute 

Native speech recognition capability in Gemini 1.5 Pro 

Search & Knowledge 

No announcement of rumoured search engine 

AI Overviews in Search to summarize complex topics, Ask Photos feature in Google Photos 

Other Announcements 

 

  • Workspace Integration - Gemini AI helper integrated into side panel of Workspace apps like Gmail, Docs, Sheets 

  • Open Models - Gemma 2.0 - next generation of open models, including PaliGemma for multimodal vision-language tasks 

  • Android Integration - new on-device AI experiences powered by Gemini Nano, like call screening and scam detection 


Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page