Soumya Prakash Pradhan

The world of technology is evolving rapidly, with AI leading the charge. In the past, AI was primarily used for generating text, but now it is venturing into video creation.

Recently, a team of AI researchers at Microsoft Research Asia developed an AI application that can transform still images of people and audio tracks into animated videos.

This groundbreaking application not only animates still images but also accurately portrays the people in the images speaking or singing, complete with appropriate facial expressions.

Below, you can see this video, which has become immensely popular after being shared on social media:

The AI model behind this innovation, called 'VASA,' is designed to create lifelike talking faces of virtual characters from a single image and an audio clip.

The researchers behind VASA-1, the premier model, have focused on creating lip movements that synchronise seamlessly with the audio, as well as capturing a wide range of facial expressions and natural head movements to enhance authenticity and liveliness.

The key to this innovation lies in the facial dynamics and head movement generation model.

The researchers have successfully developed a system that can generate high-quality videos at up to 40 frames per second with minimal latency.

The Microsoft team claims that their AI-generated videos not only synchronise lip movements with audio but also accurately convey a variety of facial expressions and natural head movements. 

With VASA-1, static images can come to life, enabling them to talk, sing, and express emotions in perfect harmony with any audio track.

How does it work?

The development of VASA-1 involved extensive training of the AI system on a vast dataset, allowing it to learn and reproduce the nuances of human emotions and speech patterns.

Rendering these realistic animations typically takes about two minutes, thanks to the computational power of a desktop-grade Nvidia RTX 4090 GPU.

Although there is no specific release date mentioned in the research paper, the team believes that VASA-1 brings them closer to a future where AI avatars can engage in natural interactions.

Also Read | Beware of new Android malware 'Mamont' that poses as Google Chrome to loot

scrollToTop