GPT-4o Pushes Current AI Technologies to the Max

The AI world is abuzz with OpenAI’s latest offering.

On May 13th, OpenAI announced GPT-4o (the ‘o’ meaning omni) to the world. GPT-4o is a multimodal model that process text, audio, images, and video. The model can also output any combination of audio, image, and text output through its prompting system.

Multimodal means an AI model can process more than one kind of input. In the good ole days of GPT-3, the model could only process text. Fast forward one year, today’s rudimentary models can process images and text. The most powerful models process images, text, and more.

World’s Most Adept Model

According to OpenAI, GPT-4o cuts the amount of time the model responds to a mere 320 milliseconds. How much is 320 milliseconds? It’s about the same amount of time of the small pauses 2 human beings engaging in conversation take between sentences. This means you can have a legit conversation with GPT-4o.

Before GPT-4o models had to switch audio, image, and text processing into different neural networks. This means you had to use networks specifically designed for audio, image, or text. GPT-4o does this with one single neural network.

GPT-4o is brand spanking new. OpenAI has released the model without fully exploring what’s possible with it. This creates an interesting prospect for application developers and device manufactures. This harkens back to the days when Apple and Microsoft would release APIs to the world and let developers define applications. GPT-4o makes things once impossible possible.

All of this power isn’t without danger however.

OpenAI is considering giving users the ability to create adult content. They also claim GPT-4o doesn’t score above medium risk in bias and social harm according to their own internal and non-transparent red team of experts.

We’ll just have to wait and see. Yours truly will be actively exploring the model for everything it’s capable of.

February 15, 2023Monday, March 27, 2023