Soumya Prakash Pradhan

In the AI race, tech giants like Google, Apple, Microsoft, and OpenAI are racing to strengthen their chatbots, competing with each other to make their AI chatbots more advanced. 

One such competitor, 'Moshi,' has now entered the market to compete with OpenAI's ChatGPT, advancing voice mode features. 'Moshi' was developed by a French AI company called Kyutai.

According to reports, Moshi is built on a 7-billion-parameter language model called Helium. It offers features similar to ChatGPT's upcoming 'Advanced Voice Mode' GPT-4o, which has been delayed. 

Moshi can understand and interpret the tone of voice and can operate offline. It supports various accents and can mimic 70 different emotional and speaking styles. 

Remarkably, Moshi can handle two audio streams simultaneously, enabling it to listen and respond at the same time.

Named after the Japanese greeting style, Moshi boasts a response time of just 200 milliseconds, faster than GPT-4o's Advanced Voice Mode, which typically takes between 232 to 320 milliseconds. 

Unlike GPT-4o, which is larger, Moshi is relatively small and was developed from scratch in six months by a team of eight researchers. 

On the other hand, the OpenAI GPT-4o voice model is a newly developed text-to-speech model that supports upgrades including improvements in quality and speed across over 50 languages. 

The voice mode on GPT-4o efficiently recognises speakers' voices and integrates transcription, intelligence, and text-to-speech capabilities. 

OpenAI’s ChatGPT 4o introduces speech and video capabilities, enabling users to interact with the model through voice and video inputs. 

While Moshi may not directly compete with ChatGPT, it represents a significant step in developing open-source models in the AI space.

OTV News is now on Whatsapp

Join and get latest news update delivered to you via whatsapp

Join Now
scrollToTop