OpenAI collaborated with different professional voice actors and used its open-source speech recognition software dubbed Whisper to enable a seamless experience for ChatGPT users.
Leading generative artificial intelligence (AI) company that is backed by Microsoft Corporation (NASDAQ: MSFT) OpenAI has announced the addition of more features to the ChatGPT platform meant for paid users. According to the announcement, ChatGPT can now see, hear, and speak thus enabling more interactive experiences with the users. Notably, the new ChatGPT features are expected to be rolled out in the next two weeks. Previously, ChatGPT only had the text-based generative model, which is now complemented by the image search and voice recognition features.
The company noted that the voice recognition feature will be available to iOS and Android users, while the image search will be rolled out on all devices across different platforms.
“You can now use voice to engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate,” OpenAI noted in the announcement.
With users in a position to choose from five different voices, the company highlighted that the feature is capable of generating human-like audio from text. Moreover, the ChatGPT voice feature is powered by a new text-to-speech model, whereby the feature also uses the company’s Whisper system to transcribe users’ spoken words into text-based.
ChatGPT Gets More Personal with New Features
As for image search on the ChatGPT, users can now get detailed information about a given image just by combining it with the drawing tool in the mobile application. For instance, a ChatGPT user can circle to a certain part of the image where more details are needed to help the AI generate fine-tuned results.
“Image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images,” the company added.
The company is dedicated to rolling out the new services to where it’s needed most, including to people living with disabilities, manufacturing companies that need to scale up their operations, and eliminating the language barrier. For instance, Spotify is using the power of this technology for a pilot program that helps podcasters expand the reach of their storytelling by translating podcasts into additional languages in their native voices.
OpenAI and Market Outlook
The launch of voice and image recognition features in the ChatGPT will help OpenAI remain competitive in a dynamic environment full of competitors. On Monday, Amazon.com Inc (NASDAQ: AMZN) announced a strategic investment of $4 billion in an AI startup called Anthropic. Nonetheless, the demand for AI products remains high around the world thus capable of accommodating even more startups.