Google introduces AudioPaLM, a new multimodal language model that combines the capabilities of PaLM-2 and AudioLM.

AudioPaLM handles both text and speech, making it ideal for generating translations with authentic voices and speech recognition.

PaLM-2 focuses on understanding linguistic information in texts, while AudioLM remembers paralinguistic information like speaker identification and tone

By integrating PaLM-2 and AudioLM, AudioPaLM enhances the understanding of text and voice, enabling more comprehensive communication

AudioPaLM can be used for real-time multilingual communication, recording and replicating voices in different languages, and transferring voices based on spoken instructions.

AudioPaLM has achieved top results in voice translation benchmarks and competitive performance in speech recognition tests.

