Get started using speech-to-text features in your applications with Azure Cognitive Services, and learn more about how NVIDIA Triton Inference Server software helps teams deploy AI models at scale. Triton enables developers to do this concurrently on a single GPU, even with models that use different deep learning frameworks. Concurrent model execution: Real-time captions and transcriptions require running multiple deep learning models at once.With dynamic batching in Triton, single inference requests are automatically combined to form a batch, better using GPU resources without impacting model latency. Dynamic batching: Batch size is the number of input samples a neural network processes simultaneously.Streaming inference : NVIDIA and Azure Cognitive Services worked together to customize the speech-to-text application with a novel stateful streaming inference feature that can keep track of prior speech context for improved, latency-sensitive caption accuracy.Some of the software’s key capabilities that enable the Microsoft Teams captions and transcription features to scale to a larger number of meetings and users include: Users can even develop custom backends tailored to their applications. NVIDIA Triton helps streamline AI model deployment and unlock high-performance inference. Trifecta of Triton Features Drives Efficiency Using NVIDIA GPUs and Triton software helps Microsoft achieve high accuracy with powerful neural networks without sacrificing low latency: the speech-to-text conversion still streams in real time.Īnd when transcription is enabled, individuals can easily catch up on missed material after a meeting has concluded. ”But the bigger a model is, the harder it is to run cost-effectively in real time.” “AI models like these are incredibly complex, requiring tens of millions of neural network parameters to deliver accurate results across dozens of different languages,” said Shalendra Chhabra, principal PM manager for Teams Calling and Meetings and Devices at Microsoft. The model recognizes jargon, names and other meeting context to improve caption accuracy. Teams’ transcriptions and captions, generated by Cognitive Services, convert speech to text as well as identify the speaker of each statement. The underlying speech recognition technology is available as an API in Cognitive Services. Developers can use it to customize and run their own applications for customer service call transcription, smart home controls or AI assistants for first responders. Triton enables Cognitive Services to support highly advanced language models, delivering highly accurate, personalized speech-to-text results in real time, with very low latency. Adopting Triton ensures that the NVIDIA GPUs running these speech-to-text models are used to their full potential, reducing cost by giving customers higher throughput using fewer computational resources. Teams uses Cognitive Services to optimize the speech recognition models using the NVIDIA Triton open-source inference serving software. Real-time captioning can be especially useful for attendees who are deaf or hard of hearing, or who are non-native speakers of the language used in a meeting. The live captions feature helps attendees follow the conversation in real time, while transcription features help attendees provides an easy way to later revisit good ideas or catch up on missed meetings. Teams enables communication and collaboration worldwide for nearly 250 million monthly active users. Teams conversations are captioned and transcribed in 28 languages using Microsoft Azure Cognitive Services, a process that will soon run crucial compute-intensive neural network inference on NVIDIA GPUs Microsoft Teams helps students and professionals worldwide follow along to online meetings with AI-generated live captions and real-time transcription - features that are getting a boost from NVIDIA AI computing technologies for training and NVIDIA Triton Inference Server for inference of speech recognition models.
0 Comments
Leave a Reply. |