Again for those in the back.
Mapp’s Modus on Machines
- Machines can’t think.
- Machines don’t think.
- Machines are trained on human frailty.
- Machines simulate human frailty.
After being ridiculed in the AI press, Google answers with PaLM-E. PaLM-E is a visual language model. This type of model incorporates computer vision with language. This allows someone to control a robot using plain English commands instead of low-level programming.
How Does It Work
PaLM-E has been trained on 562 billion parameters. PaLM-E’s training incorporates motor control as well, which means it can respond to a task, “bring me the rice chips from the drawer,” according to Google. PaLM-E does this by analyzing video input, translating the input prompt, and rendering the input prompt as a series of commands.
The new AI’s model is resilient. It’s designed to overcome obstacles and execute tasks to completion.
PaLM-E is the largest VLM reported to date. We observe emergent capabilities like multimodal chain of thought reasoning, and multi-image inference, despite being trained on only single-image prompts. Though not the focus of our work, PaLM-E sets a new SOTA on OK-VQA benchmark. pic.twitter.com/9FHug25tOF
— Danny Driess (@DannyDriess) March 7, 2023
What Does Multimodal Mean?
AI researchers and engineers are scrambling to be the first to achieve Artificial General Intelligence or AGI. AGI means that an AI can perform well in solving problems across multiple domains. Multimodal. Today’s AIs perform very well in single purpose, special use cases. AGIs theoretically can perform as well as humans. Being able to fetch chips from the rice drawer, and rewrite your resume.
I am really curious!
I’ve spent the last two months fooling with GPT technologies and I’m a true believer.
I don’t think these technologies will replace humans, but we should think about the best applications of the technology and how we can protect humanity from them.
The best is yet to come.
And think. After watching Bard crash and burn, Microsoft is pushing the envelope and letting Bing say anything.