We’re introducing three audio models in the API that unlock a new class of voice apps for developers. With these models, developers can build voice experiences that feel more natural, respond more intelligently, and take action in real time: • GPT‑Realtime‑2, our first voice model with GPT‑5‑class reasoning that can handle harder requests and carry the conversation forward naturally. • GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker. • GPT‑Realtime‑Whisper, a new streaming speech-to-text that transcribes speech live as the speaker talks.
ADVERTISEMENT
Being the Developer of this, I’m proud of the video.
So we've reached universal translator Star Trek levels. Thanks, open AI
open ai and anthropic going back and forth with the releases and updates like its a rap battle atp gahhdayum 😂
moment of silence for human language barrier. it had a great run. rip. 🙏
RIP human translators
I’ll have to admit this is really impressive. Also, we got speaking agents and live interpreters before GTA 6 😂😂
When is this coming to Codex and ChatGPT?
Very cool, your new live translation model looks really interesting.
10 years ago this was considered science-fiction.
Let's prepare to voicemaxx😺🔊
OpenAI always know exactly what's needed. Really hope this voice mode makes it's way to the app
GPT translates better than me 😅. Impressed 👍.
Chat gpt voice translation is the best. It never fails to translate every word correctly. Just wish you guys brought voice input to codex.
Extremely useful for trilingual families!!!
"ChatGPT’s new audio models feel like we finally replaced the goblins and gremlins inside voice AI with actual intelligence." -ChatGPT
This has the power to actually end borders...Wow
Live translation is really useful for traveling
Nice to see, at the end of the video, that ChatGPT glazes its own developers too. 😊
Wait, it has proactive audio now?! (The ability to wait until an appropriate time to respond rather than always responding to every input/sound.) The previous version of Gemini Flash had that, and it was wonderful, but the new version doesn't. It's a feature that's *so important* to making speech-to-speech interactions/conversations feel natural, and I've wanted it in AI models for a long time, but no one other than the previous Gemini has ever had it. This is amazing!
This is good. Now we just need the same, but open source