Mid-May AI Essential News: Announcements from OpenAI and Google

Updated: May 23

Image generated by DALL-E, prompted by Janey Treleaven of Intelligence Assist

TL;DR

Last week was a thrilling showdown between OpenAI and Google, each unveiling their latest multimodal AI advancements to the public. The highlight was OpenAI's GPT-4o—an unscripted, live demo that captured attention for its transparency and real-world application, contrasting with Google's Gemini 1.5 Pro presentation, which, despite its polish, was largely limited to developer previews.

This face-off not only underlined the increasing significance of "Multimodal" and "Tokens" in our AI vocabulary but also showcased a preference for authenticity and immediate availability in technology demonstrations. OpenAI's approach resonated more with the audience, emphasizing genuine utility over numerical superiority. As multimodal AI continues to evolve, it's set to redefine small business landscapes worldwide, making the future an exciting prospect.

The Battle for Multimodal AI Supremacy ai essential news

Last week both OpenAI and Google made big announcements about their multimodal AI capabilities at their respective events, Spring Update and Google I/O 2024. While both companies focused on making AI more accessible and human-like, I couldn't help but feel that OpenAI came out on top. ai essential news

What really impressed me about OpenAI's presentation was the live demo. Sure, it didn't go exactly as planned, but that's what made it so authentic. It was a real demonstration of their technology in action, not some overly polished, pre-recorded showcase. The fact that it went viral only proves that people appreciate transparency and genuineness.

On the other hand, Google's update was bigger, brighter, and more polished—exactly what you'd expect from a tech giant like Google. However, when you look past the glitz and glamour, you realize that most of the announced features are only available as private previews for developers, not the general public. It's a bit disappointing, especially when we've grown accustomed to announcements being followed by immediate availability, like in the good old days of Steve Jobs' Apple keynotes.

infographic image of: "What is Multimodal AI?"

The Token Wars

Now, let's talk about tokens. Google's Gemini 1.5 Pro boasts an impressive 2 million tokens, while OpenAI's GPT-4o offers 128,000.

It's a significant difference, and it reminds me of the early days of PCs and camera megapixels. Everyone was obsessed with the numbers, but eventually, they became less important. For now, tokens are still a relevant consideration when choosing an LLM, but I believe that the real value lies in how accessible and usable the technology is for end-users.

infographic image of: "What are AI Tokens?"

The Verdict

In my opinion, OpenAI won this round. They delivered a real, honest presentation that showcased their multimodal AI capabilities in action. While Google's announcement was impressive, it felt more like a tease for developers rather than a game-changer for small businesses like mine.

As we move forward, I'm excited to see how these advancements in multimodal AI will shape the future of small businesses in Australia and beyond.

The Facts

For those who want the detail, here is a comparison table of the key announcements from OpenAI's Spring Update event on May 13, 2024, and Google's I/O developer conference on

May 14, 2024:

Category	OpenAI Spring Update	Google I/O 2024
Flagship AI Model	GPT-4o: Combines GPT-4 level intelligence with improved speed and expanded multimodal capabilities across text, vision, and audio	Gemini 1.5 Pro: State-of-the-art multimodal model with up to 1 million token context window in a limited preview
Lightweight / Optimized Models	No major announcement	Gemini 1.5 Flash: Lighter-weight model optimized for high-frequency tasks where low latency and cost matter most
Availability	GPT-4o is being rolled out to all users, including ChatGPT free tier, over the next few weeks	Gemini 1.5 Pro and Flash available in private preview via Google AI Studio and Vertex AI
Cost	50% cheaper in the API compared to GPT-4
Developer Tools	New Assistants API for building AI apps with goals that can call models and tools	JSON mode for structured data extraction, improved function calling in Gemini API
Image Generation	DALL·E 3 API with enhanced image quality and more aspect ratio options	No major update to Imagen announced
Video Generation	No update on Sora video model	Veo - generative AI tool that can create 1080p videos from text prompts
Speech Recognition	Whisper large-v2 model available via API at $0.006/minute	Native speech recognition capability in Gemini 1.5 Pro
Search & Knowledge	No announcement of rumoured search engine	AI Overviews in Search to summarize complex topics, Ask Photos feature in Google Photos
Other Announcements		Workspace Integration - Gemini AI helper integrated into side panel of Workspace apps like Gmail, Docs, Sheets Open Models - Gemma 2.0 - next generation of open models, including PaliGemma for multimodal vision-language tasks Android Integration - new on-device AI experiences powered by Gemini Nano, like call screening and scam detection

Mid-May AI Essential News: Announcements from OpenAI and Google

The Battle for Multimodal AI Supremacy ai essential news

The Token Wars

The Verdict

The Facts

Recent Posts

Comments