LaNE – Ruihan Zhao

LaNE: Accelerating Visual Sparse-Reward Learning with Latent Nearest-Demonstration-Guided Explorations

Ruihan Zhao¹ Ufuk Topcu^1,† Sandeep Chinchali^1,† Mariano Phielipp^2,†

¹The University of Texas at Austin ²Intel AI Lab

iOS Teleop App: [App Store] Gym Franka Environment: [GitHub]

LaNE is a data-efficient reinforcement learning (RL) method to solve sparse-reward tasks from image observations. First, LaNE builds on the pre-trained DINOv2 feature extractor to learn an embedding space for forward prediction. Next, it rewards the agent for exploring near the demonstrations, quantified by quadratic control costs in the embedding space. Our method achieves state-of-the-art sample efficiency in Robosuite simulation and enables under-an-hour RL training from scratch on a Franka Panda robot, using only a few demonstrations.

The Method

(a) We utilize variable-length demonstrations, each consisting of observations \(o_i^t\) and actions \(a_i^t\). (b) A dense exploration reward \(r_e\) is given when a transition lands sufficiently close to a demonstration in a learned embedding space. The reward is discounted based on its distance to the goal. (c) Using the combined reward signal, the RL agent learns to map a sensor observation \(o\) to an action \(a\).

LaNE learns a latent space with locally linear dynamics. Given a transition tuple \((o, \, a, \, o’)\), the observations are first encoded by a frozen DINOv2 model into \(w\) and \(w’\). Next, the encoder \(E_\phi\) further embeds \(w\) into a low-dimensional latent state \(z\). The forward model \(M_\psi\) predicts the transition matrices \(A, \, B\) and offset \(c\). Finally, the decoder \(D_\theta\) reconstructs \(w’\) from the predicted \(\hat{z’}\), where \(\hat{z’} = Az + Ba + c\). The trainable modules \(E_\phi\), \(M_\psi\), and \(D_\theta\) are colored in orange.

Simulation Results

We compare LaNE with optimal control and state-of-the-art RL methods in four table-top manipulation tasks in the Robosuite simulator. Our method (red) consistently learns faster and converges to higher success rates than all three baseline methods.

Real Robot Experiments

Demonstration

Evaluation

Reach a Location

0:20 (1) Demonstration

20:55 Training

Open a Drawer

0:25 (1) Demonstration

51:40 Training

Lift a Block

2:00 (5) Demonstrations

44:16 Training

Insert a Pen

3:00 (5) Demonstrations

57:58 Training

Teleop System

A key feature of our approach is to learn from a small collection of human demonstrations. We build an application where a user can tele-operate the robot by moving and rotating a smartphone. Our iPhone application utilizes primitives from Apple’s ARKit to stream the position and orientation of the device to the PC. These are then translated into movements on the robot arm.

LaNE: Accelerating Visual Sparse-Reward Learning with Latent Nearest-Demonstration-Guided Explorations​

The Method

Simulation Results

Real Robot Experiments

Teleop System

LaNE: Accelerating Visual Sparse-Reward Learning with Latent Nearest-Demonstration-Guided Explorations