embedded award 2024: Artificial intelligence nominees
13.03.2024 Autonomous & Intelligent Systems Expertenwissen embedded world

embedded award 2024: Artificial intelligence nominees

Since 2023, Artificial intelligence (AI) has been one of two new categories of the embedded award. The jury has selected the most promising solutions in this exciting field from the submissions.

Grafik mit mehreren Quadraten bzw. Würfeln. Auf einem Würfel steht AI AI has been a new category of the embedded award since 2023

A near-real-time speech transcription system, energy-efficient hardware-based AI engines and an AI model proficient in electronics

 

WhisperFusion

Exhibitor: Collabora
Hall/Booth: 4/4-404

Our product, incorporating WhisperLive and WhisperSpeech, addresses the critical problem of latency in human-AI chatbot interactions. Traditional chatbot communication suffers from significant delays due to a sequential process involving audio capture, transcription, response generation, and speech synthesis. This latency diminishes the naturalness and immediacy of conversations, making interactions feel less engaging and more mechanical. WhisperLive drastically reduces transcription latency to just 50ms by employing a near-real-time speech transcription system that uses voice activity detection for efficient processing. This allows for quicker detection and processing of speech, ensuring that responses are generated and delivered with minimal delay.

Furthermore, WhisperSpeech enhances the responsiveness of text-to-speech synthesis, delivering the first token of a response in only 60ms. It achieves high-quality audio output by modeling the audio waveform with EnCodec and enhancing it with Vocos, a vocoder pre-trained on EnCodec tokens. This system ensures that the synthesized speech is not only delivered rapidly but also maintains a high level of clarity and naturalness. By integrating these technologies into a parallel processing pipeline, we minimize the interaction delays inherent in traditional chatbot systems. This innovation significantly improves the user experience by making conversations with AI feel more like natural human-to-human interactions. 

Our product's innovation lies in significantly reducing latency in AI chatbot communications, facilitating near-real-time interactions. By integrating WhisperLive and WhisperSpeech into a parallel processing architecture, we've achieved groundbreaking improvements in both speech recognition and synthesis speeds, setting our technology apart from traditional, sequentially processed chatbot systems. WhisperLive harnesses the power of the Whisper model for speech transcription with an impressively low latency of 50ms, utilizing voice activity detection to optimize efficiency.

This approach allows for rapid processing of speech, sending data for transcription only when actual speech is detected, minimizing delays. WhisperSpeech revolutionizes text-to-speech technology by delivering the first token of synthesized speech within 60ms. It employs EnCodec for advanced audio waveform modeling, achieving reasonable audio quality at just 1.5kbps. The integration of Vocos, a vocoder pre-trained on EnCodec tokens, elevates speech quality to near-human levels, ensuring clear and natural output across multiple languages. Together, these innovations address the critical issue of unnatural delays in chatbot interactions, offering a seamless and engaging conversational experience. This not only represents a significant leap in minimizing interaction latencies but also in enhancing the quality and naturalness of AI-generated speech, paving the way for more intuitive human-AI communication.


Sie möchten das Thema vertiefen? 
Auf der embedded world Exhibition&Conference 2025 
haben Sie vom 11. bis 13. März 2025 die Möglichkeit,
sich mit Branchenexperten auszutauschen. 

Cadence Neo NPU and NeuroWeave AI solution

Cadence Neo NPU and NeuroWeave AI solution

Exhibitor: Cadence
Hall/Booth: 4/4-219

 
The AI/ML industry is evolving at a very fast pace, and SoC designers need to adjust to constant design changes. Not only are AI/ML compute requirements changing, but having an AI/ML IP solution that can offer design configurability, scalability and flexibility is critical to meet time-to-market demands and address the high cost of iterations when taping out an SoC. Cadence Neo NPUs offer energy-efficient hardware-based AI engines that can be paired with any host processor (including application processors, general-purpose microcontrollers and DSPs) for offloading AI/ML processing.
 
The AI solution also includes a zero-touch machine-learning end-to-end compiler that can scale to any Neo NPU configuration. The data flow graph compiler can ingest models from various AI open-source frameworks and map any model to the NPU to extract the best metrics, e.g. end-to-end latency, inferences per second, bandwidth consumption, etc. Since software is a critical part of any AI solution, Cadence also upgraded its common software toolchain with the introduction of the NeuroWeave SDK. Providing customers with a uniform, scalable and configurable software stack across Tensilica DSPs, controllers and Neo NPUs to address all target applications, the NeuroWeave SDK streamlines product development and enables an easy migration as design requirements evolve.

Innovations around hyper-parameter based smart search algorithms for model compression and model tuning are unique for our product. The product architecture natively supports the processing required for many network topologies and operators, allowing for a complete or near-complete offload from the host processor. The Neo NPUs provide performance scalability from 256 up to 32k 8x8-bit MAC per cycle with a single core, suiting an extensive range of processing needs. Capacity configurations are available in power-2 increments, allowing for the right sizing in an SoC for the target applications.
 
Int4, Int8, Int16, and FP16 are all natively supported data types, with mixed precision supported by the hardware and associated software tools, allowing for the best performance and accuracy tradeoffs. Additional features of the Neo NPUs include compression/decompression to minimize system memory space and bandwidth consumption for a network and energy-optimized compute hardware to leverage network sparsity tradeoffs. In order to extract maximal performance, innovations in IP design and software compilers are critical. Our AI solution achieves these metrics across a broad IP portfolio which is its unique strength.
Voltai

Voltai

Exhibitor: Voltai
Hall/Booth: 3A/3A-335

“Read the Freaking manual” – this is a term every engineer in the electronics industry is familiar with. Engineers today are swimming in an ocean of documents. An average microcontroller has 10,000 – 50,000 pages of documents. This is just one chip, an engineer should integrate multiple of these chips to build a system – just 10x the documents they need to handle. Let alone the fact that these documents are written and read by non native English speakers riddled with mistakes. On the other hand, semiconductor companies have to hire hundreds of support engineers (FAEs) to support their customers.

Even with this, they can only support the top 5-10% of their customer base. Semiconductor companies are losing business and unable to support their long tail customers. If an engineer reaches out for support, they would need hours if not days to respond to an answer which is mostly along the lines of “Read the user manual”.

The entire hardware industry’s development speed is capped by how fast the engineers can read thousands of pages of PDFs. This is the sad state of our industry. This is the problem we’re after! 

We have built foundation models like GPT specialized on electronics and semiconductors. Our models cannot write poetry Models trained for hard-core engineering tasks, not for writing emails or poems. Our models today can, Help engineers answer questions they have about a chip – no more scrolling thousands of pages Write firmware code Help select the best part for their project Point out errors and help debug their firmware We support multiple languages, show sources for every answer we product, have 0 hallucinations.

Our model ingests a wide range of resources, including user manuals, architecture manuals, datasheets, errata sheets, forum discussions, internal databases and code samples, circuit diagrams, etc. Through extensive training on this data, we create an AI model proficient in electronics and well-versed in the intricacies of specific chips. Started less than 2 years ago, Voltai is answering questions of engineers behind the hood at some of the top semiconductor companies of the world.