Large Language Models (LLMs), such as GPT-4, GPT-3.5, and Llama2, have demonstrated powerful reasoning capabilities in robotics, allowing robots to understand user intentions, generate task workflows, and adapt to complex scenarios. However, while LLMs can produce detailed steps, these are often not directly executable without the integration of task-specific operational knowledge. To address this, our framework employs two distinct methods: Sub-action Prediction Model (SubActPM) for task-specific precision . The SubActPM is customized for specific tasks using a BiLSTM-MHAE architecture, ensuring higher accuracy in specialized scenarios. This methods is integrated with DATRN to enable trajectory learning, allowing robots to capture, store, and reuse precise movements. This combined approach enables robots to interpret both verbal and visual inputs, execute tasks with greater autonomy, and adapt seamlessly to dynamic environments, providing a versatile and powerful solution for advanced robotic systems.
Overall framework: The SubActPM provides a list of identified sub-actions and the objects extracted from the user's input command. The environment analyzer then checks for the target object, if available, the robot execution model retrieves the target object's coordinates, which are then used by the DATRN library to execute the sub-actions.
In the video above, a human operator trains the robot using the Dynamic Movement Primitives (DMP) framework to learn a trajectory. By manually guiding the manipulator from one position to another, the robot is able to acquire the trajectory effectively.
In this task, the robot picks up the bottle and places it in the bowl according to the human's instructions.
In this task, the robot picks up the bottle and gives it to the person according to the human's instructions.
In this task, the robot picks up the bottle and pour the water into the cup according to the human's instructions.
In this task, the robot grasp the cloth on the table and clean the table according to the human's instructions.
Multiple Tasks Video Demonstration.