NetApp and NVIDIA: Delivering breakthroughs in conversational AI

Mike McNamara

2021-01-13

99 瀏覽

Conversational artificial intelligence is revolutionizing business operations in every industry with applications like virtual agents, chatbots, and assistants. A conversational AI system engages in humanlike dialog, understands context, and offers intelligent responses by recognizing speech and text, understanding intent, deciphering language, and responding in a way that mimics human conversation. However, these AI models can be massive and highly complex.

For a high-quality conversation between a human and a machine, responses have to be quick, intelligent, and natural sounding. The larger a model is, the longer the lag between a user’s question and the AI’s response. Gaps longer than two-tenths of a second sound unnatural. Therefore, all the necessary computation must take place in a 200-millisecond window.

With such a tight latency budget, developers of conversational AI have to make tradeoffs. A high-quality, complex model could be used as a chatbot, where latency isn’t as essential as it is in a voice interface. Or developers can use a less bulky language-processing model that delivers results quickly but lacks nuanced responses. We are all familiar with how a voice assistant may stall during conversations by providing a response like “let me look that up for you” before answering a question. The ideal conversational AI is complex enough to accurately understand a person’s queries, and fast enough to respond quickly in seamless natural language.

NetApp and NVIDIA collaborate for conversational AI architecture

NetApp and NVIDIA are collaborating to create a conversational AI architecture that delivers the required response times. With NetApp® ONTAP® AI, powered by NVIDIA DGX systems and NetApp cloud-connected storage, state-of-the-art language models can be trained and optimized for rapid inference.

To demonstrate the capabilities of this architecture, NetApp has used this framework to create NARA, a simple virtual assistant for retail. NARA consists of the components illustrated in the following figure.

Major elements of the framework include the following.

Nvidia Jarvis. Jarvis provides GPU-accelerated services for conversational AI using an end-to-end deep learning pipeline optimized to keep latency low.

Jarvis comes with pretrained conversational AI models for speech, vision, and natural language understanding tasks, all available from the NVIDIA GPU Cloud (NGC).
In addition to AI services, Jarvis allows you to fuse vision, audio, and other sensor input.
With NVIDIA NeMo you can easily fine tune existing models using your own data to achieve better accuracy for specific needs.

NetApp ONTAP AI. This proven architecture combines NVIDIA DGX systems and NetApp all-flash storage. ONTAP AI reliably streamlines the flow of data, enabling it to train and run complex conversational models without exceeding the latency budget.

Incorporates the latest NVIDIA DGX A100 for unprecedented compute density, performance, and flexibility.
Uses NVIDIA Mellanox high-performance Ethernet switches to unify AI workloads, simplify deployment, and accelerate ROI.
NetApp AFF systems keep data flowing to deep learning processes with fast, flexible all-flash storage, using end-to-end NVMe. The AFF A800 is capable of feeding data to NVIDIA DGX systems up to 4 times faster than competing solutions.

NVIDIA NeMo. A Python toolkit for building, training, and fine-tuning GPU-accelerated conversational AI models, NeMo enables you to build models with easy-to-use APIs, including real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support. Pretrained, customizable models can be downloaded from NVIDIA GPU Cloud.

The NARA framework uses several example fulfillment engines, enabling it to answer questions about weather, find retail locations, and provide some pricing information. You can view a demonstration of NARA in action with text-based input. It also works with voice.

How the architecture works

Training

The framework starts with a pretrained model from NGC. The chosen model can be fine tuned using archived and generated text data, such as customer queries and dialog transcripts. This data allows the system to recognize intents for specific use cases and to connect to the appropriate fulfillment resources to provide the necessary information to respond to domain-specific questions. Developers can deliver an improved conversational AI experience with fast time to market.

Inference

When the framework receives spoken input, Jarvis uses automatic speech recognition (ASR) to translate it into text. The text is routed to the Dialog Manager, where the state of the conversation is remembered. The Jarvis natural language processing service determines the speaker’s intent, enabling the Dialog Manager to request specific actions from the Fulfillment Engine.

The Fulfillment Engine uses third-party APIs and SQL databases or other means to perform the requested action and return results to the Dialog Manager. If an audio response is needed, the resulting text response is routed to the Jarvis text-to-speech (TTS) module.

Each conversation history can be used for ongoing NeMo training, so the service continues to improve as users interact with the system.

Planned enhancements

NetApp and NVIDIA are continuing to enhance this conversational AI framework. In particular, we are working on merging NVIDIA Merlin, a deep recommender application framework, to enable development of more nuanced and intelligent recommendation systems.

We are also working on a solution for edge inferencing that combines the capabilities of NetApp HCI and the NVIDIA Triton inference server. This solution also incorporates NetApp CloudSync and Trident capabilities to simplify edge data management tasks.

Find out more

Although we chose a retail use case to demonstrate the capabilities of this conversational AI framework, the approach has obvious uses far beyond the retail domain, including industries such as financial services, insurance, healthcare, and more. This framework enables the creation of conversational services that eliminate the frustrations of the voice-activated menu trees of past systems.

For full details, read our recent white paper, NetApp Conversational AI Using NVIDIA Jarvis. Or watch this on-demand webinar, originally presented at NetApp INSIGHT® 2020.

More information and resources

To learn more about the full range of NetApp AI solutions, visit netapp.com/ai. And check out these resources to learn more about NetApp AI solutions:

Mike McNamara

Mike McNamara 是 NetApp 產品和解決方案行銷的資深主管，在資料管理和雲端儲存行銷領域擁有超過 25 年的豐富經驗。在十年前加入 NetApp 之前，Mike 曾任職於 Adaptec、Dell EMC 和 HPE 等公司。Mike 是推出第一方雲端儲存產品和業界第一款雲端連線 AI/ML 解決方案 (NetApp)、統一化橫向擴充和混合雲儲存系統與軟體 (NetApp)、iSCSI 和 SAS 儲存系統與軟體 (Adaptec)，以及光纖通道儲存系統 (EMC CLARiiON) 的重要團隊領導者。此外他曾經擔任「光纖通道產業協會 (Fibre Channel Industry Association，FCIA)」的行銷主席，也是乙太網路技術高峰會議顧問委員會、乙太網路聯盟的成員，現在仍定期為業界期刊撰稿，並經常擔任活動講師。Mike 還透過 FriesenPress 出版了一本名為《橫向擴充儲存設備 - 企業資料管理的未來樣貌》的書籍，並被 Kapos 列為值得關注的 50 名 B2B 產品行銷人員。

查看 Mike McNamara 的所有文章

後續步驟

部落格

瞭解雲端、內部部署和兩者之間的最新趨勢和發展。更棒的是，所有這一切都觸手可及。

開始閱讀

社群

探索各種開放式論壇，您可以在其中發表問題、分享答案，在對您最重要的所有 NetApp 技術上更精通。

加入討論