Research Staff, Voice AI Foundations

Website Deepgram

job description

Deepgram is the primary infrastructure provider for the trillion-dollar Voice AI economy, offering real-time APIs for speech-to-text, text-to-speech, and scalable voice agents. With a foundation built on processing over 50,000 years of audio and transcribing 1 trillion words, we empower over 200,000 developers and 1,300 organizations including Twilio, Cloudflare, and NVIDIA. We operate with an AI-first mindset, moving at the rapid pace of the industry to develop voice-native foundation models that deliver unmatched accuracy and low latency.


It’s Important to Us That You Have

  • Strong mathematical foundation in statistical learning theory, particularly in areas relevant to self-supervised and multimodal learning

  • Deep expertise in foundation model architectures, with an understanding of how to scale training across multiple modalities

  • Proven ability to bridge theory and practice—someone who can both derive novel mathematical formulations and implement them efficiently

  • Demonstrated ability to build data pipelines that can process and curate massive datasets while maintaining quality and diversity

  • Track record of designing controlled experiments that isolate the impact of architectural innovations and validate theoretical insights

  • Experience optimizing models for real-world deployment, including knowledge of hardware constraints and efficiency techniques

  • History of open-source contributions or research publications that have advanced the state of the art in speech/language AI


About the Role

As a Member of the Research Staff, you will pioneer Latent Space Models (LSMs) to overcome the fundamental data and cost barriers of universal voice AI. Your work will involve building next-generation neural audio codecs, developing steerable generative models for diverse human speech, and creating embedding systems that factorize audio into interpretable dimensions like style and environment. By leveraging “latent recombination” to generate synthetic data at an unprecedented scale, you will help design architectures capable of powering real-time inference for hundreds of millions of concurrent, empathic conversations.

To apply for Company Website jobs.ashbyhq.com.