embedded award 2024: Artificial intelligence nominees
Since 2023, Artificial intelligence (AI) has been one of two new categories of the embedded award. The jury has selected the most promising solutions in this exciting field from the submissions.
A near-real-time speech transcription system, energy-efficient hardware-based AI engines and an AI model proficient in electronics
WhisperFusion
Exhibitor: Collabora
Hall/Booth: 4/4-404
Our product, incorporating WhisperLive and WhisperSpeech, addresses the critical problem of latency in human-AI chatbot interactions. Traditional chatbot communication suffers from significant delays due to a sequential process involving audio capture, transcription, response generation, and speech synthesis. This latency diminishes the naturalness and immediacy of conversations, making interactions feel less engaging and more mechanical. WhisperLive drastically reduces transcription latency to just 50ms by employing a near-real-time speech transcription system that uses voice activity detection for efficient processing. This allows for quicker detection and processing of speech, ensuring that responses are generated and delivered with minimal delay.
Furthermore, WhisperSpeech enhances the responsiveness of text-to-speech synthesis, delivering the first token of a response in only 60ms. It achieves high-quality audio output by modeling the audio waveform with EnCodec and enhancing it with Vocos, a vocoder pre-trained on EnCodec tokens. This system ensures that the synthesized speech is not only delivered rapidly but also maintains a high level of clarity and naturalness. By integrating these technologies into a parallel processing pipeline, we minimize the interaction delays inherent in traditional chatbot systems. This innovation significantly improves the user experience by making conversations with AI feel more like natural human-to-human interactions.
Our product's innovation lies in significantly reducing latency in AI chatbot communications, facilitating near-real-time interactions. By integrating WhisperLive and WhisperSpeech into a parallel processing architecture, we've achieved groundbreaking improvements in both speech recognition and synthesis speeds, setting our technology apart from traditional, sequentially processed chatbot systems. WhisperLive harnesses the power of the Whisper model for speech transcription with an impressively low latency of 50ms, utilizing voice activity detection to optimize efficiency.
This approach allows for rapid processing of speech, sending data for transcription only when actual speech is detected, minimizing delays. WhisperSpeech revolutionizes text-to-speech technology by delivering the first token of synthesized speech within 60ms. It employs EnCodec for advanced audio waveform modeling, achieving reasonable audio quality at just 1.5kbps. The integration of Vocos, a vocoder pre-trained on EnCodec tokens, elevates speech quality to near-human levels, ensuring clear and natural output across multiple languages. Together, these innovations address the critical issue of unnatural delays in chatbot interactions, offering a seamless and engaging conversational experience. This not only represents a significant leap in minimizing interaction latencies but also in enhancing the quality and naturalness of AI-generated speech, paving the way for more intuitive human-AI communication.
Would you like to delve deeper into the topic?
At embedded world Exhibition&Conference 2025 from March 11 to 13, 2025,
you will have the opportunity to exchange ideas with industry experts.
Cadence Neo NPU and NeuroWeave AI solution
Exhibitor: CadenceHall/Booth: 4/4-219
Innovations around hyper-parameter based smart search algorithms for model compression and model tuning are unique for our product. The product architecture natively supports the processing required for many network topologies and operators, allowing for a complete or near-complete offload from the host processor. The Neo NPUs provide performance scalability from 256 up to 32k 8x8-bit MAC per cycle with a single core, suiting an extensive range of processing needs. Capacity configurations are available in power-2 increments, allowing for the right sizing in an SoC for the target applications.
Voltai
Exhibitor: Voltai
Hall/Booth: 3A/3A-335
“Read the Freaking manual” – this is a term every engineer in the electronics industry is familiar with. Engineers today are swimming in an ocean of documents. An average microcontroller has 10,000 – 50,000 pages of documents. This is just one chip, an engineer should integrate multiple of these chips to build a system – just 10x the documents they need to handle. Let alone the fact that these documents are written and read by non native English speakers riddled with mistakes. On the other hand, semiconductor companies have to hire hundreds of support engineers (FAEs) to support their customers.
Even with this, they can only support the top 5-10% of their customer base. Semiconductor companies are losing business and unable to support their long tail customers. If an engineer reaches out for support, they would need hours if not days to respond to an answer which is mostly along the lines of “Read the user manual”.
The entire hardware industry’s development speed is capped by how fast the engineers can read thousands of pages of PDFs. This is the sad state of our industry. This is the problem we’re after!
We have built foundation models like GPT specialized on electronics and semiconductors. Our models cannot write poetry Models trained for hard-core engineering tasks, not for writing emails or poems. Our models today can, Help engineers answer questions they have about a chip – no more scrolling thousands of pages Write firmware code Help select the best part for their project Point out errors and help debug their firmware We support multiple languages, show sources for every answer we product, have 0 hallucinations.
Our model ingests a wide range of resources, including user manuals, architecture manuals, datasheets, errata sheets, forum discussions, internal databases and code samples, circuit diagrams, etc. Through extensive training on this data, we create an AI model proficient in electronics and well-versed in the intricacies of specific chips. Started less than 2 years ago, Voltai is answering questions of engineers behind the hood at some of the top semiconductor companies of the world.