Just as we thought the Generative AI hype was starting to settle, OpenAI stirred the waters on May 13, 2024, when it announced a new flagship model for ChatGPT: the GPT-4o (“o” for “omni”). But did you hear the shocking truth about it?
No, we aren’t talking about its new voice model, which closely resembled Scarlett Johansson. The model was rolled back after the actor made her discontent very clear in a tweet.
We’re referring to a host of new features that GPT-4o brings, including real-time reasoning across audio, vision, and text. OpenAI claims GPT-4o is faster than GPT-4 and significantly better at translation and coding, boasting a more human-like interaction style than its predecessor.
But is it truly as impressive as they make it out to be? Is it the first step towards the multi-modal AI technology we’ve long been promised? Read on as we put OpenAI’s claims to the test and uncover the truth about GPT-4o’s new features and functionalities.
Overview of GPT-4o
GPT-4o is OpenAI’s newest large language model, developed to deliver a more natural human-computer interaction. The model accepts any combination of text, audio, image, and video as input and generates outputs in any combination of text, audio, and image.
While GPT-4 could also understand audio and video inputs, GPT-4o is supposed to be better at the task. According to OpenAI –
“GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API.”
How to Access GPT-4o?
GPT-4o is available to all ChatGPT users. However, free users only get limited prompts. If you’re using the free ChatGPT version, you must upgrade to ChatGPT Plus to get up to 5 time more GPT-4o prompt limits. You can access GPT-4o via ChatGPT web and mobile apps.
Once you’ve upgraded to ChatGPT Plus, GPT-4o will be set as your default language model. If it doesn’t appear, you can manually set GPT-4o as your preferred model from the drop-down menu in the chat interface. If you want, you can also switch back to GPT-4 as well as GPT-3.5.
Key Features of GPT-4o
1. Multimodal Capabilities
In GPT-4o, you can provide input in the form of text, audio, or video. Based on your input, the system will generate text, audio, and image output.
2. Real-Time Interaction
With an average response time of 320 milliseconds, GPT-4o’s responsiveness are comparable to human response time in conversations.
3. Enhanced Vision Abilities
GPT-4o is much better at analyzing visual inputs, such as images and videos. This allows the model to understand and generate more accurate textual and visual results based on visual inputs.
4. Multilingual Support
GPT-4o supports 50+ languages and comes with significant advancements in text processing for non-English languages.
GPT-4o Vs. GPT-4: Exploring ChatGPT’s New Capabilities
Let’s put GPT-4o to the test and see how it fares against its predecessor, the GPT-4.
1. GPT-4o Vs. GPT-4: Analyzing Text Input
To test the two GPT models’ text-analyzing capacity, we entered a simple prompt instructing them to generate a poem. We wanted to assess the speed at which they analyzed the text and generated a result.
Here’s a side-by-side comparison of the output produced by GPT-4o and GPT-4:
GPT-4o clearly outpaces its predecessor, as it not only produced a longer poem from our prompt but is took less time than GPT-4. It also enhanced the depth and detail in its depiction of the characters and settings.
2. GPT-4o Vs. GPT-4: Analyzing Image Input
Next, we wanted to test how good GPT-4o is at analyzing images and generating output based on them. We also tested its speed and accuracy against the older GPT version. Here are the results:
Again, GPT-4o leaves its predecessor far behind in terms of speed. It analyzed the image and generated a text output based on the prompt accompanying the picture. However, the content of the text was almost similar in terms of words and phrases used and overall quality.
3. GPT-4o Vs. GPT-4: Analyzing Video Input
Until now, ChatGPT users only had the option to enter text, images, and documents as input because GPT-4 and other models didn’t support video inputs. However, with GPT-4o, OpenAI has enabled this functionality.
We can attach video files with our prompts, and ChatGPT will generate the requested output after analyzing the video. To test this feature, we asked ChatGPT to analyze a video of our GPT-4o vs. GPT-4 comparison from the previous section.
Responding to our initial prompt, GPT-4o provided details like the video’s title, duration, and resolution. However, when given a follow-up prompt, it returned a detailed analysis of the introduction and responses from the two GPT models.
4. GPT-4o Vs. GPT-4: Translation
OpenAI especially emphasized GPT-4o’s performance with non-English text. So, let’s test how good it is at translation. We asked ChatGPT to translate an excerpt from Mandarin to English using GPT-4 and GPT-4o. See the comparison below:
While both translations are almost similar, we noticed that GPT-4o was significantly faster, which has been the case in every comparison we have done so far. We also used Google Translate to verify the accuracy of these translations. The result was similar, with only minor differences.
5. GPT-4o Vs. GPT-4: Generating Images
Image generation with ChatGPT has been a hit-and-miss situation, especially if the image has to include text. Users frequently face issues like wrong spellings, uneven spaces, and blurred letters. So, is GPT-4o any better? Let’s find out.
We used the same prompt to generate an image using GPT-4o and GPT-4. See the comparison below:
In our brief testing, we found only marginal differences in the images generated by the two GPT models. There was hardly any improvement in the quality or accuracy of text in images created using GPT-4o compared to results from GPT-4.
Wrapping Up
GPT-4o is a welcome addition to ChatGPT, primarily because it’s twice as fast and 50% cheaper. It can analyze text, images, and videos in prompts and respond to queries almost in real time. While there’s still room for improvement in the quality of output generated, we definitely noticed how little time it takes to produce them.
But where GPT-4o really shines is its ability to understand video and voice inputs. With human-like response times and better accuracy, ChatGPT has taken the lead over competitors and become the first true multi-modal gen AI platform.
2 thoughts on “Leaked: The Shocking Truth About GPT-4o’s Abilities”