We like to imagine a world of autonomous robots that take care of tedious tasks so that we don’t have to. Chris Fenton likes to imagine robots made of garbage that roam around his backyard, and Grasso was born:
Who says AGI has to be super intelligent just to be A, G and I? Grasso is driven by a kind of python ‘madlib’ wrapped around two LLMs (one multi-modal, one text-only). The outer loop takes a photo with its webcam and feeds it into a multi-modal LLM to generate a scene description. That scene description is then inserted into a prompt (“This is what you currently see with your robot eyes…”) that ends with “Choose your next action” and presents a list of actions the robot can take, some of which are ‘direct’ commands, and others that are ‘open ended’ and let Grasso finish the action prompt however it chooses.