Awesome-Robotics-Manipulation

✨ About

This repo contains a curated list of Robot Manipulation papers relating to Robotics domain.

Please feel free to send pull requests or email me to add papers! This version of the repository may have some typos, so don’t hesitate to contact me for corrections!

🏠 Table of Contents

Awesome Papers
Awesome Benchmarks
Awesome-techniques

📝 Awesome Papers

📄 Survey

Title	Venue	Date	Code	Notes
A Survey of Embodied Learning for Object-Centric Robotic Manipulation	arXiv	2024-08-21	Github	Manipulation
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI	arXiv	2024-07-09	Github	Embodied Agent
A Survey on Vision-Language-Action Models for Embodied AI	arXiv	2024-05-23	-	VLA Models
Survey of Learning-based Approaches for Robotic In-Hand Manipulation	arXiv	2024-01-15	-	In-hand Manipulation
Language-conditioned Learning for Robotic Manipulation: A Survey	arXiv	2023-12-17	Github	Manipulation
Deep Learning Approaches to Grasp Synthesis: A Review	T-RO 2023	2023-07-06	Project	Grasp

🦾 Grasp

Rectangle-based Grasp

Title	Venue	Date	Code
RoboGrasp: A Universal Grasping Policy for Robust Robotic Control	arXiv	2025-02-05	-
HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments	arXiv	2024-10-04	-
LLGD: Lightweight Language-driven Grasp Detection using Conditional Consistency Model	IROS 2024	2024-07-25	Github
grasp_det_seg_cnn: End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB	ICRA 2021	2021-07-12	Github
GR-ConvNet: Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network	IROS 2020	2019-09-11	Github

6-DoF Grasp

Title	Venue	Date	Code
Real-to-Sim Grasp: Rethinking the Gap between Simulation and Real World in Grasp Detection	CoRL 2024	2024-10-09	Project
OrbitGrasp: SE(3)-Equivariant Grasp Learning	CoRL 2024	2024-07-03	Project
EquiGraspFlow: SE(3)-Equivariant 6-DoF Grasp Pose Generative Flows	CoRL 2024	2024-09-06	Github
EconomicGrasp: An Economic Framework for 6-DoF Grasp Detection	ECCV 2024	2024-07-11	Github
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge	CVPR 2024	2024-04-02	Github
FlexLoG: Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping	arXiv	2024-03-22	-
HGGD: Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes	RA-L 2023	2024-03-27	Github
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains	T-RO 2023	2022-12-16	Github
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping	CVPR 2020	2020	Github
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation	ICCV 2019	2019-05-25	Github

Grasp with 3D Techniques

Title	Venue	Date	Code
SDF
IGD: Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping	CoRL 2024	Github	Gitlab
NeuGraspNet: Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering	RSS 2024	2023-06-12	-
NeRF
LERF-TOGO: Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping	CoRL 2023	2023-09-14	Github
GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF	ICRA 2023	2022-10-12	Github
3D Gaussian Splatting (3DGS)
SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images	arXiv	2024-12-03	-
GraspSplats: Efficient Manipulation with 3D Feature Splatting	CoRL 2024	2024-09-03	Github
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping	RA-L 2024	2024-03-14	Github

Language-Driven Grasp

Title	Venue	Date	Code
RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects	arXiv	2025-01-16	-
Attribute-Based Robotic Grasping with Data-Efficient Adaptation	T-RO 2024	2024-12-12	Project
RTAGrasp: Learning Task-Oriented Grasping from Human Videos via Retrieval, Transfer, and Alignment	ICRA 2025	2024-09-24	Project
LGrasp6D: Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance	ECCV 2024	2024-07-18	Github
Reasoning Grasping: Reasoning Grasping via Multimodal Large Language Model	CoRL 2024	2024-02-09	Project
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter	CoRL 2024	2024-07-16	Github
OWG: Towards Open-World Grasping with Large Vision-Language Models	CoRL 2024	2024-06-26	Project
RT-Grasp: Reasoning Tuning Robotic Grasping via Multi-modal Large Language Model	IROS 2024	2024-11-07	Project

Grasp for Transparent Objects

Title	Venue	Date	Code
T²SQNet: A Recognition Model for Manipulating Partially Observed Transparent Tableware Objects	CoRL 2024	2024-09-06	Github
ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera	ICRA 2024	2024-05-09	Github
Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects	CoRL 2021	2021-10-27	Github

Dexterous Grasp

Title	Venue	Date	Code
Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice	arXiv	2024-12-14	Project
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping	arXiv	2024-12-03	Github

🤖 Manipulation

Representation Learning with Auxiliary Tasks

Title	Venue	Date	Code
Contrastive Learning (Alignment)
Σ-agent: Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation	CoRL 2024	2024-06-14	Project
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers	RSS 2024	2024-03-19	Project
R3M: A Universal Visual Representation for Robot Manipulation	CoRL 2022	2022-03-23	Github
HULC: What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data	RA-L 2022	2022-04-13	Github
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning	CoRL 2021	2022-02-04	Github
Masked Reconstruction
STP: Spatiotemporal Predictive Pre-training for Robotic Motor Control	arXiv	2024-03-08	-
MUTEX: Learning Unified Policies from Multimodal Task Specifications	CoRL 2023	2023-09-25	Github
Robot Learning with Sensorimotor Pre-training	CoRL 2023	2023-06-16	Project
Voltron: Language-Driven Representation Learning for Robotics	RSS 2023	2023-02-24	Github
MVP: Real-World Robot Learning with Masked Visual Pre-training	CoRL 2022	2022-10-06	Github
Text Goal Generation
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning	ICRA 2025	2024-09-23	Github
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought	NeurIPS 2023	2023-05-24	Github
COTPC: Chain-of-Thought Predictive Control	ICML 2024	2023-04-03	Github
Visual Goal Generation
VIRT: Vision Instructed Transformer for Robotic Manipulation	arXiv	2024-10-09	Github
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance	CoRL 2024	2024-08-06	Github
GENIMA: Generative Image as Action Models	CoRL 2024	2024-07-10	Github
ATM: Any-point Trajectory Modeling for Policy Learning	RSS 2024	2023-12-28	Github
MPI: Learning Manipulation by Predicting Interaction	RSS 2024	2024-06-01	Github
OCI: Object-Centric Instruction Augmentation for Robotic Manipulation	ICRA 2024	2024-01-05	Project
HOPMan: Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction Plans	ICRA 2024	2023-12-01	Project
CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation	CoRL 2023	2023	Project
Image / Video Prediction
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation	ICLR 2025	2024-12-19	Github
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations	arXiv	2024-12-19	Project
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images	arXiv	2024-10-26	Project
FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation	arXiv	2024-09-29	Project
VideoAgent: Self-Improving Video Generation	arXiv	2024-10-14	Github
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy	RA-L 2025	2024-08-26	Github
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation	arXiv	2024-10-08	Project
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation	RSS 2024	2024-07-13	Github
GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation	ICLR 2024	2023-12-20	Github
SuSIE: Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models	ICLR 2024	2023-10-16	Github
VLP: Video Language Planning	ICLR 2024	2023-10-16	Github

Visual Representation Learning

Title	Venue	Date	Code
Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation	arXiv	2025-02-05	Github
MCR: Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets	ICLR 2025	2024-10-29	Github
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation	ICLR 2025	2024-10-10	Github
CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation	NeurIPS 2024	2024-09-13	Github
Theia: Distilling Diverse Vision Foundation Models for Robot Learning	CoRL 2024	2024-07-29	Github
MPI: Learning Manipulation by Predicting Interaction	RSS 2024	2024-06-01	Github
VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?	NeurIPS 2023	2023-03-31	Github
MVP: Real-World Robot Learning with Masked Visual Pre-training	CoRL 2023	2022-10-06	Github
LIV: Language-Image Representations and Rewards for Robotic Control	ICML 2023	2023-06-01	Github
VIMA: General Robot Manipulation with Multimodal Prompts	ICML 2023	2022-10-06	Github
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware	RSS 2023	2023-04-23	Github
Voltron: Language-Driven Representation Learning for Robotics	RSS 2023	2023-02-24	Github
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training	ICLR 2023	2022-08-30	Github
R3M: A Universal Visual Representation for Robot Manipulation	CoRL 2022	2022-03-23	Github
ZeST: Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?	L4DC 2022	2022-04-23	Project

Multimodal Representation Learning

Title	Venue	Date	Code
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation	ICLR 2025	2025-01-25	-
MS-Bot: Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation	CoRL 2024	2024-08-02	Github
MUTEX: Learning Unified Policies from Multimodal Task Specifications	CoRL 2023	2023-09-25	Github

Latent Action Learning

Title	Venue	Date	Code
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation	arXiv	2024-12-05	Github
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation	ICRA 2025	2024-09-27	Project
IGOR: Image-GOal Representations Atomic Control Units for Foundation Models in Embodied AI	-	2024	Project
LAPA: Latent Action Pretraining from Videos	ICLR 2025	2024-10-15	Github
GRIF: Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control	CoRL 2023	2023-06-30	Github
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play	CoRL 2023	2023-02-24	Github
KOAP: Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers	arXiv	2024-10-24	-
LAPO: Learning to Act without Actions	ICLR 2024	2023-12-17	Github
ILPO: Imitating Latent Policies from Observation	ICML 2019	2018-05-21	Github

World Model

Title	Venue	Date	Code
Sirius-Fleet: Multi-Task Interactive Robot Fleet Learning with Visual World Models	CoRL 2024	2024-10-30	Project
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning	CoRL 2023	2024-01-06	Project
FOWM: Finetuning Offline World Models in the Real World	CoRL 2023	2023-10-24	Github
SWIM: Structured World Models from Human Videos	RSS 2023	2023-08-23	Project
Surfer: Progressive Reasoning with World Models for Robotic Manipulation	arXiv	2023-06-20	Github

Asynchronous Action Learning

Title	Venue	Date	Code
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation	NeurIPS 2024	2024-10-14	Project
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers	CoRL 2024	2024-09-12	-
MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models	CoRL 2023	2024-01-25	Github

Diffusion Policy Learning

Title	Venue	Date	Code
AffordDP: Generalizable Diffusion Policy with Transferable Affordance	arXiv	2024-12-04	Project
Instant Policy: In-Context Imitation Learning via Graph Diffusion	ICLR 2025	2024-11-19	Github
STMDP: Brain-inspired Action Generation with Spiking Transformer Diffusion Policy Model	arXiv	2024-11-15	-
MBA: Motion Before Action: Diffusing Object Motion as Manipulation Condition	arXiv	2024-11-14	Github
DiT Policy: Diffusion Transformer Policy	arXiv	2024-10-21	-
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation	arXiv	2024-10-19	Project
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	ICLR 2025	2024-10-10	Github
ScaleDP: Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation	ICRA 2025	2024-09-22	Project
SDP: Spiking Diffusion Policy for Robotic Manipulation with Learnable Channel-Wise Membrane Thresholds	arXiv	2024-09-17	-
DiT-Block Policy: The Ingredients for Robotic Diffusion Transformers	arXiv	2024-10-14	Github
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy	CoRL 2024	2024-10-23	Github
EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning	CoRL 2024	2024-07-01	Github
SDP: Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning	CoRL 2024	2024-07-01	Github
RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective	IROS 2024	2024-04-18	Project
MDT: Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals	RSS 2024	2024-07-08	Github
R&D: Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning	RSS 2024	2024-05-28	Github
DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations	RSS 2024	2024-03-06	Github
PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play	CoRL 2023	2023-12-07	Project
EquiDiff: Equivariant Diffusion Policy	CoRL 2024	2024-07-01	Code
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects	RSS 2023	2022-11-08	Github
BESO: Goal-Conditioned Imitation Learning using Score-based Diffusion Policies	RSS 2023	2023-04-05	Github
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion	RSS 2023	2023-03-07	Github

Other Policies

Title	Venue	Date	Code
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation	arXiv	2025-01-03	Project
Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation	arXiv	2024-12-12	Project
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction	arXiv	2024-12-09	Github
FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation	AAAI 2025	2024-12-06	Github
Autoregressive Action Sequence Learning for Robotic Manipulation	arXiv	2024-10-04	Github
MaIL: Improving Imitation Learning with Selective State Space Models	CoRL 2024	2024-06-12	Github

Vision Language Action Models

Title	Venue	Date	Code
RAD: Action-Free Reasoning for Policy Generalization	arXiv	2025-02-04	Project
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation	arXiv	2025-02-04	-
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent	arXiv	2025-01-31	-
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model	arXiv	2025-01-27	Github
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation	arXiv	2024-12-29	Github
RoboVLMs: Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models	arXiv	2024-12-18	Github
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	arXiv	2024-12-16	Github
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies	ICLR 2025	2024-12-13	Github
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression	arXiv	2024-12-14	Project
π₀ : A Vision-Language-Action Flow Model for General Robot Control	arXiv	2024-10-31	Project
BYOVLA: Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust	arXiv	2024-10-02	Github
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation	RA-L 2025	2024-09-19	Github
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution	NeurIPS 2024	2024-11-04	Github
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation	NeurIPS 2024	2024-06-06	Github
DP-VLA: A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM	CoRL 2024	2024-10-21	-
OpenVLA: An Open-Source Vision-Language-Action Model	CoRL 2024	2024-06-13	Github
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning	CoRL 2024	2024-06-17	Github
ECoT: Robotic Control via Embodied Chain-of-Thought Reasoning	CoRL 2024	2024-07-11	Github
3D-VLA: A 3D Vision-Language-Action Generative World Model	ICML 2024	2024-03-14	Github
Octo: An Open-Source Generalist Robot Policy	RSS 2024	2024-05-20	Github
RoboFlamingo: Vision-Language Foundation Models as Effective Robot Imitators	ICLR 2024	2023-11-02	Github
RT-H: Action Hierarchies Using Language	arXiv	2024-03-04	Project
Open X-Embodiment: Robotic Learning Datasets and RT-X Models	ICRA 2024	2023-10-13	Github
MOO: Open-World Object Manipulation using Pre-trained Vision-Language Models	CoRL 2023	2023-03-02	Project
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	CoRL 2023	2023-07-28	Project
RT-1: Robotics Transformer for Real-World Control at Scale	RSS 2023	2022-12-13	Github

Reinforcement Learning

Title	Venue	Date	Code
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network	arXiv	2025-02-01	-
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model	ICLR 2025	2024-12-18	Github
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning	arXiv	2024-12-13	Project
HIL-SERL: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning	arXiv	2024-10-29	Project
PointPatchRL - Masked Reconstruction Improves Reinforcement Learning on Point Clouds	CoRL 2024	2024-10-24	Project
SPIRE: Synergistic Planning, Imitation, and Reinforcement for Long-Horizon Manipulation	CoRL 2024	2024-10-23	Project
Maniwhere: Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning	CoRL 2024	2024-07-22	Project
PSL: Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks	ICLR 2024	2024-05-02	Github
TD-MPC2: Scalable, Robust World Models for Continuous Control	ICLR 2024	2023-10-25	Github
VELAP:	CoRL 2023	2023	-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions	CoRL 2023	2023-09-18	Project
PTR: Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials	RSS 2023	2022-10-11	Project
TD-MPC: Temporal Difference Learning for Model Predictive Control	ICML 2022	2022-03-09	Github

Motion, Tranjectory and Flow

Title	Venue	Date	Code
Path Planning
LACO: Language-Conditioned Path Planning	CoRL 2023	2024-08-31	Github
Motion Planning
DiffusionSeeder: Seeding Motion Optimization with Diffusion for Rapid Motion Planning	CoRL 2024	2024-10-22	Project
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation	CoRL 2024	2024-09-03	Github
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models	ICRAW 2024	2024-03-13	Github
Elastic-DS: Task Generalization with Stability Guarantees via Elastic Dynamical System Motion Policies	CoRL 2023	2023-09-05	Github
Trajectory Optimization
ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs	arXiv	2024-05-30	Project
PointFlowMatch: Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching	CoRL 2024	2024-09-11	Project
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation	ICRA 2024	2023-08-30	Github
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models	CoRL 2023	2023-07-12	Github
LATTE: LAnguage Trajectory TransformEr	ICRA 2023	2022-08-04	Github
Trajectory-conditioned policy
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies	arXiv	2024-12-09	Github
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation	ECCV 2024	2024-05-02	Github
ATM: Any-point Trajectory Modeling for Policy Learning	RSS 2024	2023-12-28	Github
AWE: Waypoint-Based Imitation Learning for Robotic Manipulation	CoRL 2023	2023-07-26	Github
Flow-conditioned policy
Im2Flow2Act: Flow as the Cross-Domain Manipulation Interface	CoRL 2024	2024-07-21	Github
AVDC: Learning to Act from Actionless Videos through Dense Correspondences	ICLR 2024	2023-10-12	Github

Data Collection, Selection and Augmentation

Title	Venue	Date	Code
Data Collection
ALPHA-α and Bi-ACT Are All You Need: Importance of Position and Force Information/Control for Imitation Learning of Unimanual and Bimanual Robotic Manipulation with Low-Cost System	arXiv	2024-11-15	Project
SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment	CoRL 2024	2024-10-24	Project
NILS: Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models	CoRL 2024	2024-10-23	Project
SOAR: Autonomous Improvement of Instruction Following Skills via Foundation Models	CoRL 2024	2024-07-30	Github
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models	CoRL 2024	2024-06-27	Project
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation	CoRL 2024	2024-03-12	Github
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots	RSS 2024	2024-02-15	Github
AirExo: Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild	ICRA 2024	2023-09-26	Github
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling	ICRA 2024	2023-06-20	Github
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition	CoRL 2023	2023-07-26	Github
DIAL: Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models	RSS 2023	2022-11-21	Project
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation	TMLR 2023	2023-06-20	Github
Data Selection
What Matters in Learning from Large-Scale Datasets for Robot Manipulation	ICLR 2025	2025-01-23	Project
AMF: Active Fine-Tuning of Generalist Policies	arXiv	2024-10-07	-
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning	CoRL 2024	2024-08-26	Github
An Unbiased Look at Datasets for Visuo-Motor Pre-Training	CoRL 2023	2023-10-13	Github
Data Quality in Imitation Learning	NeurIPS 2023	2023-06-04	-
Data Retrieval
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning	ICLR 2025	2024-12-19	Project
Retrieval-Augmented Embodied Agents	CVPR 2024	2024-04-17	-
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets	RSS 2023	2023-04-08	Github
Data Augmentation
RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from Demonstrations	arXiv	2024-11-25	Project
RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning	CoRL 2024	2024-09-05	Project
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning	CoLLAs 2024	2024-07-30	Project
Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning	RSS 2024	2023-02-27	Github
ROSIE: Scaling Robot Learning with Semantically Imagined Experience	RSS 2023	2023-02-22	Project
GenAug: Retargeting behaviors to unseen situations via Generative Augmentation	RSS 2023	2023-02-13	Github
Evaluation
Contrast Sets for Evaluating Language-Guided Robot Policies	CoRL 2024	2024-06-19	-

Affordance Learning

Title	Venue	Date	Code
Articulated Object Affordance
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?	arXiv	2024-12-13	-
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models	ICRA 2025	2024-09-16	Project
A3VLM: Actionable Articulation-Aware Vision Language Model	CoRL 2024	2024-06-14	Github
AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation	CoRL 2024	2024-06-17	Project
SAGE: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects	RSS 2024	2023-12-03	Github
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs	ICRA 2024	2023-11-06	Github
Ditto: Building Digital Twins of Articulated Objects from Interaction	CVPR 2022	2022-08-16	Github
Part-Based Object Affordance
3DAPNet: Language-Conditioned Affordance-Pose Detection in 3D Point Clouds	ICRA 2024	2023-09-19	Github
CPM: Composable Part-Based Manipulation	CoRL 2023	2024-05-09	Project
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations	CVPR 2023	2023-03-29	Github
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts	CVPR 2023	2022-11-10	Github
Spatial Affordance
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	arXiv	2024-11-25	Project
SpatialBot: Precise Spatial Understanding with Vision Language Models	ICRA 2025	2024-06-19	Github
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics	CoRL 2024	2024-06-15	Github
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities	CVPR 2024	2024-01-22	Project
Visual Affordance
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation	CoRL 2024	2024-07-05	Github
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting	RSS 2024	2024-03-05	Github
SLAP: Spatial-Language Attention Policies	CoRL 2023	2023-04-21	Github
KITE: Keypoint-Conditioned Policies for Semantic Manipulation	CoRL 2023	2023-06-29	Project
HULC++: Grounding Language with Visual Affordances over Unstructured Data	ICRA 2023	2022-10-04	Github
CLIPort: What and Where Pathways for Robotic Manipulation	CoRL 2022	2021-09-24	Github
VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning	ICRA 2022	2022-03-01	Project
Transporter Networks: Rearranging the Visual World for Robotic Manipulation	CoRL 2020	2020-10-27	Github

3D Representation for Manipulation

Title	Venue	Date	Code
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation	arXiv	2024-11-27	Github
MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation	arXiv	2024-10-21	Github
Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting	CoRL 2024	2024-05-07	Github
IMAGINATION POLICY: Using Generative Point Cloud Models for Learning Manipulation Policies	CoRL 2024	2024-06-17	Project
Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics	CoRL 2024	2024-06-16	Project
RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation	CoRL 2024	2024-03-28	Github
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation	CoRL 2024	2024-02-23	Github
D³Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement	CoRL 2024	2023-09-28	Github
Object-Aware Gaussian Splatting for Robotic Manipulation	ICRAW 2024	2024-04-24	Project
F3RM: Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation	CoRL 2023	2023-07-27	Github
R-NDF: SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields	CORL 2022	2022-11-17	Github
NDF: Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation	ICRA 2022	2021-12-09	Github

3D Representation Policy Learning

Title	Venue	Date	Code
Diffusion Policy (DP)
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation	ICLR 2025	2024-09-30	Project
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations	CoRL 2024	2024-02-16	Github
DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations	RSS 2024	2024-03-06	Github
Reconstruction
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation	arXiv	2024-11-27	Github
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation	ECCV 2024	2024-03-13	Github
SGRv2: Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation	CoRL 2024	2024-06-15	Github
RVT-2: Learning Precise Manipulation from Few Demonstrations	RSS 2024	2024-01-12	Github
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields	CoRL 2023	2023-08-31	Github
3D4RL: Visual Reinforcement Learning with Self-Supervised 3D Representations	RA-L 2023	2022-10-13	Github
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation	CoRL 2023	2023-09-27	Github
M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place	CoRL 2023	2023-11-02	Github
PerAct: Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation	CoRL 2022	2022-09-12	Github
Visual Goal Generation
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation	CoRL 2024	2024-06-26	Project
ActAIM2: Discovering Robotic Interaction Modes with Discrete Representation Learning	CoRL 2024	2024-10-26	Project
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation	ICML 2024	2024-05-30	Github
RVT: Robotic View Transformer for 3D Object Manipulation	CoRL 2023	2023-06-26	Github
GROOT: Learning Generalizable Manipulation Policies with Object-Centric 3D Representations	CoRL 2023	2023-10-22	Github
others
SPHINX: What's the Move? Hybrid Imitation Learning via Salient Points	ICLR 2025	2024-12-06	Github
SGR: A Universalc Semantic-Geometric Representation for Robotic Manipulation	CoRL 2023	2023-06-18	Github

Reasoning, Planning and Code Generation

Title	Venue	Date	Code
Task Planning
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation	arXiv	2024-11-26	Project
Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following	arXiv	2024-04-21	-
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models	IROS 2024	2024-08-15	Project
PG-InstructBLIP: Physically Grounded Vision-Language Models for Robotic Manipulation	ICRA 2024	2023-09-05	Project
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models	ICRA 2024	2023-07-10	Github
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction	CoRL 2023	2023-06-27	Github
Saycan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances	CoRL 2023	2022-04-04	Github
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	arXiv	2023-04-22	Github
Inner Monologue: Embodied Reasoning through Planning with Language Models	CoRL 2022	2022-07-12	Project
SHOWTELL: Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstrations	CoRL 2024	2024-09-06	Project
GIRAF: Gesture-Informed Robot Assistance via Foundation Models	CoRL 2023	2023-09-06	Project
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	ICCV 2023	2022-12-08	Github
Code Generation
Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation	arXiv	2025-01-08	Project
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought	NeurIPS 2023	2023-05-26	Project
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model	arXiv	2023-05-18	Github
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models	ICRA 2023	2022-09-22	Github
ChatGPT for Robotics: Design Principles and Model Abilities	IEEE Access 2023	2023-02-20	Github
Code as Policies: Language Model Programs for Embodied Control	ICRA 2023	2022-09-16	Github
TidyBot: Personalized Robot Assistance with Large Language Models	Autonomous Robots 2023	2023-05-09	Github
Statler: State-Maintaining Language Models for Embodied Reasoning	ICRA 2024	2023-06-30	Github
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning	RSS 2024	2023-05-30	Github
Text2Motion: From Natural Language Instructions to Feasible Plans	Autonomous Robots 2023	2023-03-21	Project
Multimodal Reasoning
From Foresight to Forethought: VLM-In-the-Loop Policy Steering via Latent Alignment	arXiv	2025-02-03	-
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection	arXiv	2024-12-05	Project
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation	ICLR 2025	2024-10-01	Project
λ-Repformer: Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations	CoRL 2024	2024-10-01	Project
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation	CVPR 2024	2023-12-24	Github
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought	NeurIPS 2023	2023-05-24	Github
Matcha: Chat with the Environment: Interactive Multimodal Perception Using Large Language Models	IROS 2023	2023-03-14	Github
PaLM-E: An Embodied Multimodal Language Model	ICML 2023	2023-03-06	Github
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language	ICLR 2023	2022-04-01	Project

Generalization

Title	Venue	Date	Code
Generalization using Data
Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting	RSS 2024	2024-02-29	Github
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation	ICRA 2024	2024-02-29	Github
Compositional Generalization
Policy Architectures for Compositional Generalization in Control	NeurIPSW 2022	2022-03-10	Github
PROGRAMPORT: Programmatically Grounded, Compositionally Generalizable Robotic Manipulation	ICLR 2023	2023-04-26	Project
Efficient Data Collection for Robotic Manipulation via Compositional Generalization	RSS 2024	2024-03-08	Project
Sim2Real Generalization
Natural Language Can Help Bridge the Sim2Real Gap	RSS 2024	2024-05-16	Github
RialTo: Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation	RSS 2024	2024-03-06	Github
Domain Randomization: Sim-to-Real Transfer of Robotic Control with Dynamics Randomization	ICRA 2018	2017-10-18
Generalization for Long-horizon and Complex Task
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation	arXiv	2025-01-11	-
ManipGen: Local Policies Enable Zero-shot Long-horizon Manipulation	CoRLW 2024	2024-10-29	Project
TBBF: A Backbone for Long-Horizon Robot Task Understanding	RA-L 2025	2024-08-02	Project
STAP: Sequencing Task-Agnostic Policies	ICRA 2023	2022-10-21	Github
BOSS: Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance	CoRL 2023	2023-12-16	Github
BLADE: Learning Compositional Behaviors from Demonstration and Language	CoRL 2024	2024	Project
PALO: Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation	CoRL 2024	2024-08-29	Github
Few-shot
You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations	arXiv	2025-01-24	Github
Learning Generalizable 3D Manipulation With 10 Demonstrations	arXiv	2024-11-15	Github

Generalist

Title	Venue	Date	Code
Generalist with Different Embodiment Types
CrossFormer: Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation	CoRL 2024	2024-08-21	Github
ARIO: All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents	arXiv	2024-08-20	Project
HPT: Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers	NeurIPS 2024	2024-09-30	Github
Generalist in Different Embodied Tasks
LEO: An Embodied Generalist Agent in 3D World	ICML 2024	2023-11-18	Github
Manipulation Generalist
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding	arXiv	2025-01-08	Github
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning	arXiv	2024-12-13	Project
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation	arXiv	2024-12-10	Github
RoboDual: Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation	arXiv	2024-10-10	Project
Effective Tuning Strategies for Generalist Robot Manipulation Policies	arXiv	2024-10-02	-
Octo: An Open-Source Generalist Robot Policy	RSS 2024	2024-05-20	Github
V-GPS: Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance	CoRL 2024	2024-10-17	Project
Open X-Embodiment: Robotic Learning Datasets and RT-X Models	ICRA 2024	2023-10-13	Github
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking	ICRA 2024	2023-09-05	Github
Maniwhere: Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning	CoRL 2024	2024-07-22	Project
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation	arXiv	2024-10-19	Project
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments	arXiv	2024-09-09	Github
More for VLAs

Human-Robot Interaction and Collaboration

Title	Venue	Date	Code
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment	arXiv	2024-12-06	Project
Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration	CoRL 2024	2024-09-06	Project
APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs	CoRL 2024	-	Project
Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction	CoRL 2024	2024-08-12	Github
KNOWNO: Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners	CoRL 2023	2023-07-04	Github
LILAC: Yell At Your Robot: Improving On-the-Fly from Language Corrections	arXiv	2024-03-19	Github
YAY Robot: "No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy	HRI 2023	2023-01-06	Github

Mobile Manipulation

Title	Venue	Date	Code
Robi Butler: Remote Multimodal Interactions with Household Robot Assistant	arXiv	2024-09-30	Project
TaMMa: Target-driven Multi-subscene Mobile Manipulation	CoRL 2024	2024-09-06	-
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning	CoRL 2023	2024-07-12	Project
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation	CoRL 2024	2024-01-04	Github
GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion	ICRA 2024	2023-09-27	Github

Tactile-based Manipulation

Title	Venue	Date	Code
Digitizing Touch with an Artificial Multimodal Fingertip	arXiv	2024-11-04	Github
Sparsh: Self-supervised touch representations for vision-based tactile sensing	CoRL 2024	2024	Github
MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation	CoRL 2024	2023-10-25	Project
Octopi: Object Property Reasoning with Large Tactile-Language Models	RSS 2024	2024-05-05	Github
RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing	RSS 2024	2024-07-01	Project
RotateIt: General In-Hand Object Rotation with Vision and Touch	CoRL 2023	2023-09-18	Project
T-DEX: Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play	CoRL 2023	2023-03-21	Github

Dexterous Manipulation

Title	Venue	Date	Code
D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping	CoRLW 2024	2024-10-02	Github
Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning	arXiv	2024-07-03	Github
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	CoRL 2024	2024	Github
DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation	ICRA 2023	2022-10-06	Github
Demonstrating Learning from Humans on Open-Source Dexterous Robot Hands	RSS 2024	2024	2024-01-01
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation	CVPR 2024	2024-02-22	Github
Dexterous Functional Grasping	CoRL 2023	2023-12-05	Project
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies	CoRL 2023	2023-10-30	Project
REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation	CoRL 2023	2023-09-06	Project
Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation	CoRL 2023	2023-09-02	Github
AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System	RSS 2023	2023-07-10	Project

Other Applications

Title	Venue	Date	Code
Deformable Object Manipulation
HANDLOOM: Learned Tracing of One-Dimensional Objects for Inspection and Manipulation	CoRL 2023	2023-03-15	Project
Contact-rich Manipulation
FoAR: Force-Aware Reactive Policy for Contact-Rich Robotic Manipulation	arXiv	2024-11-24	Project
ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation	arXiv	2024-10-10	Project
Stowing Tasks
Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks	CoRL 2023	2023-09-28	Github
Object Rearrangement
PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement	WACV 2025	2024-10-29	-
LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement	IROS 2024	2023-09-27	Github
LLM-GROP: Task and Motion Planning with Large Language Models for Object Rearrangement	IROS 2023	2023-03-10	Colab
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics	RA-L 2023	2022-10-05	Project
Human-to-Robot Handover
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation	CVPR 2024	2024-01-01	Github
Cook
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools	CoRL 2023	2023-06-26	Github
Non-prehensile Manipulation
HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation	CoRL 2023	2023-05-06	Github
Feed
VAPORS: Learning Sequential Acquisition Policies for Robot-Assisted Feeding	CoRL 2023	2023-09-11	Project
Tool Manipulation
Leveraging Language for Accelerated Learning of Tool Manipulation	CoRL 2023	2022-06-27	Github
Responsible Manipulation
How vulnerable is my policy? Adversarial attacks on modern behavior cloning policies	arXiv	2025-02-06	-
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation	arXiv	2024-11-27	Github
TrojanRobot: Backdoor Attacks Against LLM-based Embodied Robots in the Physical World	arXiv	2024-11-18	Project

📊 Awesome Benchmarks

Grasp Datasets

Title	Venue	Date	Code
QDGset: A Large Scale Grasping Dataset Generated with Quality-Diversity	arXiv	2024-10-03	Project
Real-to-Sim Grasp: Rethinking the Gap between Simulation and Real World in Grasp Detection	CoRL 2024	2024-10-09	Project
Grasp-Anything-6D: Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance	ECCV 2024	2024-07-18	Github
Grasp-Anything++: Language-driven Grasp Detection	CVPR 2024	2024-06-13	Github
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models	ICRA 2024	2023-09-18	Github
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping	CVPR 2020	2020	Github

Manipulation Benchmarks

Title	Venue	Date	Code
Manipulation in Home Environment
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots	RSS 2024	2024-06-04	Github
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes	ICCV 2023	2023-04-09	Github
HomeRobot: Open-Vocabulary Mobile Manipulation	CoRL 2023	2023-06-20	Github
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks	CVPR 2020	2019-12-03	Github
Manipulation in On-Table Environment
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks	arXiv	2024-12-24	Github
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy	ICRA 2025	2024-10-02	Github
OBSBench: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning	NeuIPS 2024	2024-02-04	Github
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs	CoRL 2024	2024-10-04	Github
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation	RSS 2024	2024-02-13	Github
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning	NeurIPS 2023	2023-06-05	Github
VIMA: General Robot Manipulation with Multimodal Prompts	ICML 2023	2022-10-06	Github
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks	RA-L 2021	2021-12-06	Github
RLBench: The Robot Learning Benchmark & Learning Environment	RA-L 2020	2019-09-26	Github
KitchenShift: Evaluating Zero-Shot Generalization of Imitation-Based Policy Learning Under Domain Shifts	NeurIPSW 2021	2021	Github
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning	CoRL 2019	2019-10-24	Github
Franka-Kitchen: Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning	CoRL 2019	2019-10-25	Project
Evaluating Real-World Robot Manipulation Policies in Simulation	CoRL 2024	2024-05-09	Github
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation	arXiv	2024-10-07	-
ClutterGen: A Cluttered Scene Generator for Robot Learning	CoRL 2024	2024-07-07	Github
Tactile Manipulation
Efficient Tactile Simulation with Differentiability for Robotic Manipulation	CoRL 2022	2022	Github
Functional Manipulation
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning	IJRR 2024	2024-01-16	Github
Robot Trajectory Datasets
Open X-Embodiment: Robotic Learning Datasets and RT-X Models	ICRA 2024	2023-10-13	Github
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset	ICRA 2024	2024-03-19	Project
BridgeData V2: A Dataset for Robot Learning at Scale	CoRL 2023	2024-08-24	Github
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot	RSSW 2023	2023-07-02	Project
Embodied QA Datasets
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models	IROS 2024	2024-03-17	Github
OpenEQA: Embodied Question Answering in the Era of Foundation Models	CVPR 2024	2024	Github

Cross-Embodiment Benchmarks

Title	Venue	Date	Code
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation	arXiv	2024-12-18	Project
GENESIS: A generative world for general-purpose robotics & embodied AI learning	-	-	Github
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI	arXiv	2024-10-01	Github
All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents	arXiv	2024-08-20	Dataset
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?	NeurIPS 2023	2023-03-31	Github
Isaac Lab: Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments	RA-L 2023	2023-01-10	Github

🛠️ Awesome Techniques

Title	Venue	Date	Code
Awesome-Implicit-NeRF-Robotics: Neural Fields in Robotics: A Survey	-	2024-10-26	Github
Awesome-Video-Robotic-Papers	-	2024	Github
Awesome-Generalist-Robots-via-Foundation-Models: Neural Fields in Robotics: A Survey	-	2024	Github
Awesome-Robotics-3D	-	2024	Github
Awesome-Robotics-Foundation-Models: Foundation Models in Robotics: Applications, Challenges, and the Future	-	2023-12-13	Github
Awesome-LLM-Robotics	-	2022	Github

✨ Citation

If you find this repository useful, please consider citing this list:

@misc{bai2024roboticsmanipulation,
    title = {Awesome-Robotics-Manipulation},
    author = {Bai, Shuanghao and Ding, Pengxiang and Zhang, Haoran},
    journal = {GitHub repository},
    url = {https://github.com/BaiShuanghao/Awesome-Robotics-Manipulation},
    year = {2024},
}