SubActPM + RAN: A Modular Framework for Fine-Grained Instruction-to-Action Learning and Execution by Robotic Systems

Introduction

Large Language Models (LLMs), such as GPT-4, GPT-3.5, and Llama2, have demonstrated powerful reasoning capabilities in robotics, allowing robots to understand user intentions, generate task workflows, and adapt to complex scenarios. However, while LLMs can produce detailed steps, these are often not directly executable without the integration of task-specific operational knowledge. To address this, our framework employs two distinct methods: Sub-action Prediction Model (SubActPM) for task-specific precision . The SubActPM is customized for specific tasks using a BiLSTM-MHAE architecture, ensuring higher accuracy in specialized scenarios. This methods is integrated with DATRN to enable trajectory learning, allowing robots to capture, store, and reuse precise movements. This combined approach enables robots to interpret both verbal and visual inputs, execute tasks with greater autonomy, and adapt seamlessly to dynamic environments, providing a versatile and powerful solution for advanced robotic systems.

Full Framework of Subaction Extraction and Execution on Robot Arm

Full Framework of Subaction Extraction and Execution on the Robot Arm

Overall framework: The SubActPM provides a list of identified sub-actions and the objects extracted from the user's input command. The environment analyzer then checks for the target object, if available, the robot execution model retrieves the target object's coordinates, which are then used by the DATRN library to execute the sub-actions.

Trajectory Learning

In the video above, a human operator trains the robot using the Dynamic Movement Primitives (DMP) framework to learn a trajectory. By manually guiding the manipulator from one position to another, the robot is able to acquire the trajectory effectively.

Experimental Results

Pick & Stack Task Demonstration

In this task, the robot picks up the bottle and places it in the bowl according to the human's instructions.

Pick & Give Task Demonstration

In this task, the robot picks up the bottle and gives it to the person according to the human's instructions.

Pick & Pour Task Demonstration

In this task, the robot picks up the bottle and pour the water into the cup according to the human's instructions.

Wiping Task Demonstration

In this task, the robot grasp the cloth on the table and clean the table according to the human's instructions.

Multiple Task Demonstration Video

Multiple Tasks Video Demonstration.