ChatGPT can now See, Hear & Speak!
You read that correctly. Over this past weekend, OpenAI announced that their popular chatbot ChatGPT 4.0 can now interact with users by seeing, hearing and speaking with/to them. I got goosebumps the minute read that. I’m imaging you did too. This means that users will no longer need to type in prompts to ChatGPT - they’ll be able to interact with it just like speaking to another human. Earlier this month OpenAI announced DALL-E 3 - an AI tool that can generate any artwork you ask it to. You want to see Han Solo conduct a mariachi band at Carnegie Hall? Just ask! I only had a few days to digest this before reading about the See, Hear & Speak feature that is coming very soon to ChatGPT 4.0 Plus and Enterprise - the paid version ($20/month) of the wildly popular ChatGPT.
This new version of ChatGPT is similar to Amazon’s Alexa and Apple’s Siri. Users will be able talk to ChatGPT and it will talk back. ChatGPT will also be able to respond to images that you upload. You will be able to up0load a picture of a trumpet with a stuck mouthpiece and ChatGPT will be able to recognize the issue and give you step by step instructions on how to repair it. Teachers and students will be living in the future VERY soon.
So how does ChatGPT do it? To be honest, I think it’s a little bit of magic and a WHOLE lot of data. The GPT stands for generative pre-trained transformer. That basically means that the chatbot can generate unique content drawing from a huge dataset. That contents of that dataset pre-trains the chatbot to generate that content, and then those two are combined and transformed into natural language using the LLM (large language model) that ChatGPT draws from to sound convincingly human. Drawing from this, and conversational AI (which uses NLP (natural language processing) and machine learning) ChatGPT 4.0 can speak back to you using voice software similar to what Alexa and Siri use. The image recognition uses VFM (visual foundation models) which has been trained on the same huge dataset. Have you ever wondered why some verification systems ask you to click on all of the traffic lights? You’ve probably been helping these VFMs learn by using humans. Combine GPT, LLM, NLP, ML, and VFMs and you have an extremely powerful set of AI tools that are the engine behind this new version of ChatGPT. You also have a LOT of acronyms to become familiar with.
I will be writing a post soon on what I believe the implications for this new set of features means for education. Obviously it will have an impact, but I need time to think it through before I approach that question. In the meantime, know that this new feature is only available for paid ChatGPT subscribers, so you’ll have to watch all of the articles, news stories and YouTube clips over the coming days to see it in action. Buckle up everyone!