NVIDIA CEO Jensen Huang outlined his vision for AI, Robotics, and Omniverse Avatars in his 2021 GTC Keynote. Huang keynoted the company’s virtual GTC gathering Tuesday by introducing NVIDIA Omniverse Avatar and NVIDIA Omniverse Replicator, among a host of announcements, demos, and new initiatives.
Huang showed how NVIDIA’s Omniverse virtual simulation and collaboration platform for 3D workflows is bringing together various key technologies in the area of digital humans and avatars. Huang shared various examples of Omniverse Avatar: Project Tokkio for customer support, NVIDIA DRIVE Concierge for always-on, intelligent services in vehicles, and Project Maxine for video conferencing.
The Omniverse Avatar connects the company’s technologies in speech AI, computer vision, natural language understanding, recommendation engines, and simulation technologies. Avatars created in the platform are interactive characters with ray-traced 3D graphics that can see, speak, converse on a wide range of subjects, and reasonably understand naturally spoken intent.
In the first demonstration of Project Tokkio, Huang showed colleagues engaging in a real-time conversation with an avatar crafted as a toy replica of himself, conversing on such topics as biology and climate science.
Omniverse Avatar Key Elements
Omniverse Avatar uses elements from speech AI, computer vision, natural language understanding, recommendation engines, facial animation, and graphics delivered through the following technologies:
● The speech recognition is based on NVIDIA Riva, a software development kit that recognizes speech across multiple languages. Riva is also used to generate human-like speech responses using text-to-speech capabilities.
● The natural language understanding is based on the Megatron 530B large language model that can recognize, understand and generate human language. Megatron 530B is a pre-trained model that can, with little or no training, complete sentences, answer questions of a large domain of subjects, summarize long, complex stories, translate to other languages, and handle many domains that it is not trained specifically to do.
● The recommendation engine is provided by NVIDIA Merlin, a framework that allows businesses to build deep learning recommender systems capable of handling large amounts of data to make intelligent suggestions.
● The perception capabilities are enabled by NVIDIA Metropolis, a computer vision framework for video analytics.
● The avatar animation is powered by NVIDIA Video2Face and Audio2Face, 2D and 3D AI-driven facial animation and rendering technologies.
“The dawn of intelligent virtual assistants has arrived,” he adds. “Omniverse Avatar combines NVIDIA’s foundational graphics, simulation, and AI technologies to make some of the most complex real-time applications ever created. The use cases of collaborative robots and virtual assistants are incredible and far-reaching.” Users will be able to soon download a reference demo version and test NVIDIA Avatars for themselves. There is a new location “Showroom” which will be a new training and demo space for Omniverse projects.
In a second Project Tokkio demo, he highlighted a customer-service avatar in a restaurant kiosk, able to see, converse with and understand two customers as they ordered veggie burgers, fries, and drinks. The demonstrations were powered by NVIDIA AI software and Megatron 530B, which is currently the world’s largest customizable language model.
Project Maxine for video conferencing.
Project Maxine’s has the ability to add state-of-the-art video and audio features to virtual collaboration and content creation applications. A demo showed a woman speaking English on a video call in a noisy cafe, but she can be heard clearly without background noise. As she speaks, her words are transcribed and translated in real-time into French, German, and Spanish. Via Omniverse, the demo showed the output spoken by an avatar able to engage in conversation with the actress’s same voice and intonation.
Huang also introduced Nemo Megatron to train Large Language Models. Such large language models “will be the biggest mainstream HPC application ever,” he said. To help developers to create the huge amounts of data needed to train AI, NVIDIA announced Omniverse Replicator, a synthetic data generation for training deep neural networks.
With Omniverse, “we now have the technology to create new 3D worlds or model our physical world,” Huang said. “A constant theme you’ll see is how Omniverse is used to simulate digital twins of warehouses, plants, and factories, of physical and biological systems, the 5G edge, robots, self-driving cars, and even avatars,” he commented.
NVIDIA DRIVE Concierge for Autonomous Vehicles
NVIDIA predicts boldly that everything that moves will be able to be autonomous, fully or partly autonomous, Huang explained that his their opinion “by 2024, the vast majority of new EVs will have substantial AV capability.” In a demo of the DRIVE Concierge AI platform, a digital assistant on the center dashboard screen helps a driver select the best driving mode to reach his destination on time, and then follows his request to set a reminder once the car’s range drops below 100 miles.
Huang detailed several new technologies built into Hyperion, including Omniverse Replicator for DRIVE Sim, a synthetic data generator for autonomous vehicles built on Omniverse. NVIDIA is now running Hyperion 8 sensors, 4d perception, deep learning-based multisensor fusion, feature tracking, and a new planning engine. NVIDIA DRIVE is NVIDIA’s open platform for autonomous vehicles, and Hyperion 8 is NVIDIA’s latest complete hardware and software architecture. The inside of the car will be revolutionized, by AI too. The technology of NVIDIA Maxine is expected to be deployed within cars. “With Maxine, your car will become a concierge,” Huang said.
Huang announced the NVIDIA Isaac robotics platform can now be easily integrated into the Robot Operating System, or ROS, a widely-used set of software libraries and tools for robot applications. Isaac Sim, built on Omniverse, is the most realistic robotics simulator ever created, Huang explained. “The goal is for the robot to not know whether it is inside a simulation or the real world,” Huang said. To aid this process, Isaac Sim Replicator can generate synthetic data to train robots. Replicator simulates the sensors, generates data that is automatically labeled, and with a domain randomization engine, creates rich and diverse training data sets, Huang explained.
Huang ended by announcing NVIDIA will build a digital twin to simulate and predict climate change, E-2, or Earth-Two.