0:00

4:04

We’re introducing three audio models in the API

Tech

We’re introducing three audio models in the API that unlock a new class of voice apps for developers. With these models, developers can build voice experiences that feel more natural, respond more intelligently, and take action in real time: • GPT‑Realtime‑2, our first voice model with GPT‑5‑class reasoning that can handle harder requests and carry the conversation forward naturally. • GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker. • GPT‑Realtime‑Whisper, a new streaming speech-to-text that transcribes speech live as the speaker talks.

Comments 100 christopher_moon: Being the Developer of this, I’m proud of the video.

Comments 100

christopher_moon 1 month, 3 weeks ago

Being the Developer of this, I’m proud of the video.

carrie.chambers 1 month, 3 weeks ago

So we've reached universal translator Star Trek levels. Thanks, open AI

babyberry 1 month, 3 weeks ago

open ai and anthropic going back and forth with the releases and updates like its a rap battle atp gahhdayum 😂

lilia_medrano 1 month, 3 weeks ago

moment of silence for human language barrier. it had a great run. rip. 🙏

steven.gonzalez 1 month, 3 weeks ago

RIP human translators

lilia_medrano 1 month, 3 weeks ago

I’ll have to admit this is really impressive. Also, we got speaking agents and live interpreters before GTA 6 😂😂

christopher_moon 1 month, 3 weeks ago

When is this coming to Codex and ChatGPT?

bryanmyst97 1 month, 3 weeks ago

Very cool, your new live translation model looks really interesting.

andrew_aguilar 1 month, 3 weeks ago

10 years ago this was considered science-fiction.

michael.campbell 1 month, 3 weeks ago

Let's prepare to voicemaxx😺🔊

georgesnight77 1 month, 3 weeks ago

OpenAI always know exactly what's needed. Really hope this voice mode makes it's way to the app

brittany.gutierrez 1 month, 3 weeks ago

GPT translates better than me 😅. Impressed 👍.

robin_eaton 1 month, 3 weeks ago

Chat gpt voice translation is the best. It never fails to translate every word correctly. Just wish you guys brought voice input to codex.

genaro.chavarría 1 month, 3 weeks ago

Extremely useful for trilingual families!!!

naksh_chaudhry 1 month, 3 weeks ago

"ChatGPT’s new audio models feel like we finally replaced the goblins and gremlins inside voice AI with actual intelligence." -ChatGPT

silvia_garcía 1 month, 3 weeks ago

This has the power to actually end borders...Wow

maríadelcarmenuribe801 1 month, 3 weeks ago

Live translation is really useful for traveling

aliciabloom16 1 month, 3 weeks ago

Nice to see, at the end of the video, that ChatGPT glazes its own developers too. 😊

vasudha_khanna 1 month, 3 weeks ago

Wait, it has proactive audio now?! (The ability to wait until an appropriate time to respond rather than always responding to every input/sound.) The previous version of Gemini Flash had that, and it was wonderful, but the new version doesn't. It's a feature that's *so important* to making speech-to-speech interactions/conversations feel natural, and I've wanted it in AI models for a long time, but no one other than the previous Gemini has ever had it. This is amazing!

ray_lewis 1 month, 3 weeks ago

This is good. Now we just need the same, but open source

You've reached the end