Awesome-Robotics-Manipulation

✨ About

This repo contains a curated list of Robot Manipulation papers relating to Robotics domain.

Please feel free to send pull requests or email me to add papers! This version of the repository may have some typos, so don’t hesitate to contact me for corrections!

🏠 Table of Contents

πŸ“ Awesome Papers

πŸ“„ Survey

(back to top)

🦾 Grasp

Rectangle-based Grasp

(back to top)

6-DoF Grasp

(back to top)

Grasp with 3D Techniques

(back to top)

Language-Driven Grasp

(back to top)

Grasp for Transparent Objects

(back to top)

Dexterous Grasp

(back to top)

πŸ€– Manipulation

Representation Learning with Auxiliary Tasks

TitleVenueDateCode
Contrastive Learning (Alignment)
Ξ£-agent: Contrastive Imitation Learning for Language-guided Multi-Task Robotic ManipulationCoRL 20242024-06-14Project
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention TransformersRSS 20242024-03-19Project
R3M: A Universal Visual Representation for Robot ManipulationCoRL 20222022-03-23Star Github
HULC: What Matters in Language Conditioned Robotic Imitation Learning over Unstructured DataRA-L 20222022-04-13Star Github
BC-Z: Zero-Shot Task Generalization with Robotic Imitation LearningCoRL 20212022-02-04Star Github
Masked Reconstruction
STP: Spatiotemporal Predictive Pre-training for Robotic Motor ControlarXiv2024-03-08-
MUTEX: Learning Unified Policies from Multimodal Task SpecificationsCoRL 20232023-09-25Star Github
Robot Learning with Sensorimotor Pre-trainingCoRL 20232023-06-16Project
Voltron: Language-Driven Representation Learning for RoboticsRSS 20232023-02-24Star Github
MVP: Real-World Robot Learning with Masked Visual Pre-trainingCoRL 20222022-10-06Star Github
Text Goal Generation
RACER: Rich Language-Guided Failure Recovery Policies for Imitation LearningICRA 20252024-09-23Star Github
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeurIPS 20232023-05-24Star Github
COTPC: Chain-of-Thought Predictive ControlICML 20242023-04-03Star Github
Visual Goal Generation
VIRT: Vision Instructed Transformer for Robotic ManipulationarXiv2024-10-09Star Github
KOI: Accelerating Online Imitation Learning via Hybrid Key-state GuidanceCoRL 20242024-08-06Star Github
GENIMA: Generative Image as Action ModelsCoRL 20242024-07-10Star Github
ATM: Any-point Trajectory Modeling for Policy LearningRSS 20242023-12-28Star Github
MPI: Learning Manipulation by Predicting InteractionRSS 20242024-06-01Star Github
OCI: Object-Centric Instruction Augmentation for Robotic ManipulationICRA 20242024-01-05Project
HOPMan: Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction PlansICRA 20242023-12-01Project
CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulationCoRL 20232023Project
Image / Video Prediction
Predictive Inverse Dynamics Models are Scalable Learners for Robotic ManipulationICLR 20252024-12-19Star Github
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual RepresentationsarXiv2024-12-19Project
GHIL-Glue: Hierarchical Control with Filtered Subgoal ImagesarXiv2024-10-26Project
FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic ManipulationarXiv2024-09-29Project
VideoAgent: Self-Improving Video GenerationarXiv2024-10-14Star Github
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned PolicyRA-L 20252024-08-26Star Github
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot ManipulationarXiv2024-10-08Project
VLMPC: Vision-Language Model Predictive Control for Robotic ManipulationRSS 20242024-07-13Star Github
GR-1: Unleashing Large-Scale Video Generative Pre-training for Visual Robot ManipulationICLR 20242023-12-20Star Github
SuSIE: Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion ModelsICLR 20242023-10-16Star Github
VLP: Video Language PlanningICLR 20242023-10-16Github

(back to top)

Visual Representation Learning

TitleVenueDateCode
Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot ManipulationarXiv2025-02-05Star Github
MCR: Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot DatasetsICLR 20252024-10-29Star Github
SPA: 3D Spatial-Awareness Enables Effective Embodied RepresentationICLR 20252024-10-10Star Github
CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic ManipulationNeurIPS 20242024-09-13Star Github
Theia: Distilling Diverse Vision Foundation Models for Robot LearningCoRL 20242024-07-29Star Github
MPI: Learning Manipulation by Predicting InteractionRSS 20242024-06-01Star Github
VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?NeurIPS 20232023-03-31Star Github
MVP: Real-World Robot Learning with Masked Visual Pre-trainingCoRL 20232022-10-06Star Github
LIV: Language-Image Representations and Rewards for Robotic ControlICML 20232023-06-01Star Github
VIMA: General Robot Manipulation with Multimodal PromptsICML 20232022-10-06Star Github
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost HardwareRSS 20232023-04-23Star Github
Voltron: Language-Driven Representation Learning for RoboticsRSS 20232023-02-24Star Github
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-TrainingICLR 20232022-08-30Star Github
R3M: A Universal Visual Representation for Robot ManipulationCoRL 20222022-03-23Star Github
ZeST: Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?L4DC 20222022-04-23Project

(back to top)

Multimodal Representation Learning

(back to top)

Latent Action Learning

(back to top)

World Model

(back to top)

Asynchronous Action Learning

(back to top)

Diffusion Policy Learning

TitleVenueDateCode
AffordDP: Generalizable Diffusion Policy with Transferable AffordancearXiv2024-12-04Project
Instant Policy: In-Context Imitation Learning via Graph DiffusionICLR 20252024-11-19Star Github
STMDP: Brain-inspired Action Generation with Spiking Transformer Diffusion Policy ModelarXiv2024-11-15-
MBA: Motion Before Action: Diffusing Object Motion as Manipulation ConditionarXiv2024-11-14Star Github
DiT Policy: Diffusion Transformer PolicyarXiv2024-10-21-
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic ManipulationarXiv2024-10-19Project
RDT-1B: a Diffusion Foundation Model for Bimanual ManipulationICLR 20252024-10-10Star Github
ScaleDP: Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic ManipulationICRA 20252024-09-22Project
SDP: Spiking Diffusion Policy for Robotic Manipulation with Learnable Channel-Wise Membrane ThresholdsarXiv2024-09-17-
DiT-Block Policy: The Ingredients for Robotic Diffusion TransformersarXiv2024-10-14Star Github
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion PolicyCoRL 20242024-10-23Star Github
EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient LearningCoRL 20242024-07-01Star Github
SDP: Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot LearningCoRL 20242024-07-01Star Github
RISE: 3D Perception Makes Real-World Robot Imitation Simple and EffectiveIROS 20242024-04-18Star Project
MDT: Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal GoalsRSS 20242024-07-08Star Github
R&D: Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour CloningRSS 20242024-05-28Star Github
DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D RepresentationsRSS 20242024-03-06Star Github
PlayFusion: Skill Acquisition via Diffusion from Language-Annotated PlayCoRL 20232023-12-07Project
EquiDiff: Equivariant Diffusion PolicyCoRL 20242024-07-01Star Code
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen ObjectsRSS 20232022-11-08Star Github
BESO: Goal-Conditioned Imitation Learning using Score-based Diffusion PoliciesRSS 20232023-04-05Star Github
Diffusion Policy: Visuomotor Policy Learning via Action DiffusionRSS 20232023-03-07Star Github

(back to top)

Other Policies

(back to top)

Vision Language Action Models

TitleVenueDateCode
RAD: Action-Free Reasoning for Policy GeneralizationarXiv2025-02-04Project
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic ManipulationarXiv2025-02-04-
UP-VLA: A Unified Understanding and Prediction Model for Embodied AgentarXiv2025-01-31-
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action ModelarXiv2025-01-27Star Github
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic ManipulationarXiv2024-12-29Star Github
RoboVLMs: Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsarXiv2024-12-18Star Github
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningarXiv2024-12-16Star Github
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic PoliciesICLR 20252024-12-13Star Github
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and AutoregressionarXiv2024-12-14Project
Ο€0 : A Vision-Language-Action Flow Model for General Robot ControlarXiv2024-10-31Project
BYOVLA: Run-time Observation Interventions Make Vision-Language-Action Models More Visually RobustarXiv2024-10-02Star Github
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationRA-L 20252024-09-19Star Github
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionNeurIPS 20242024-11-04Github
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and ManipulationNeurIPS 20242024-06-06Star Github
DP-VLA: A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLMCoRL 20242024-10-21-
OpenVLA: An Open-Source Vision-Language-Action ModelCoRL 20242024-06-13Star Github
LLARVA: Vision-Action Instruction Tuning Enhances Robot LearningCoRL 20242024-06-17Star Github
ECoT: Robotic Control via Embodied Chain-of-Thought ReasoningCoRL 20242024-07-11Star Github
3D-VLA: A 3D Vision-Language-Action Generative World ModelICML 20242024-03-14Star Github
Octo: An Open-Source Generalist Robot PolicyRSS 20242024-05-20Star Github
RoboFlamingo: Vision-Language Foundation Models as Effective Robot ImitatorsICLR 20242023-11-02Star Github
RT-H: Action Hierarchies Using LanguagearXiv2024-03-04Project
Open X-Embodiment: Robotic Learning Datasets and RT-X ModelsICRA 20242023-10-13Star Github
MOO: Open-World Object Manipulation using Pre-trained Vision-Language ModelsCoRL 20232023-03-02Project
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCoRL 20232023-07-28Project
RT-1: Robotics Transformer for Real-World Control at ScaleRSS 20232022-12-13Star Github

(back to top)

Reinforcement Learning

(back to top)

Motion, Tranjectory and Flow

TitleVenueDateCode
Path Planning
LACO: Language-Conditioned Path PlanningCoRL 20232024-08-31Star Github
Motion Planning
DiffusionSeeder: Seeding Motion Optimization with Diffusion for Rapid Motion PlanningCoRL 20242024-10-22Project
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic ManipulationCoRL 20242024-09-03Star Github
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation ModelsICRAW 20242024-03-13Star Github
Elastic-DS: Task Generalization with Stability Guarantees via Elastic Dynamical System Motion PoliciesCoRL 20232023-09-05Star Github
Trajectory Optimization
ORION: Vision-based Manipulation from Single Human Video with Open-World Object GraphsarXiv2024-05-30Project
PointFlowMatch: Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow MatchingCoRL 20242024-09-11Project
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual ImitationICRA 20242023-08-30Star Github
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language ModelsCoRL 20232023-07-12Star Github
LATTE: LAnguage Trajectory TransformErICRA 20232022-08-04Star Github
Trajectory-conditioned policy
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot PoliciesarXiv2024-12-09Star Github
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot ManipulationECCV 20242024-05-02Star Github
ATM: Any-point Trajectory Modeling for Policy LearningRSS 20242023-12-28Star Github
AWE: Waypoint-Based Imitation Learning for Robotic ManipulationCoRL 20232023-07-26Star Github
Flow-conditioned policy
Im2Flow2Act: Flow as the Cross-Domain Manipulation InterfaceCoRL 20242024-07-21Star Github
AVDC: Learning to Act from Actionless Videos through Dense CorrespondencesICLR 20242023-10-12Star Github

(back to top)

Data Collection, Selection and Augmentation

TitleVenueDateCode
Data Collection
ALPHA-Ξ± and Bi-ACT Are All You Need: Importance of Position and Force Information/Control for Imitation Learning of Unimanual and Bimanual Robotic Manipulation with Low-Cost SystemarXiv2024-11-15Project
SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and DeploymentCoRL 20242024-10-24Project
NILS: Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation ModelsCoRL 20242024-10-23Project
SOAR: Autonomous Improvement of Instruction Following Skills via Foundation ModelsCoRL 20242024-07-30Star Github
Manipulate-Anything: Automating Real-World Robots using Vision-Language ModelsCoRL 20242024-06-27Project
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous ManipulationCoRL 20242024-03-12Star Github
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild RobotsRSS 20242024-02-15Star Github
AirExo: Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the WildICRA 20242023-09-26Star Github
SPRINT: Scalable Policy Pre-Training via Language Instruction RelabelingICRA 20242023-06-20Star Github
Scaling Up and Distilling Down: Language-Guided Robot Skill AcquisitionCoRL 20232023-07-26Star Github
DIAL: Robotic Skill Acquisition via Instruction Augmentation with Vision-Language ModelsRSS 20232022-11-21Project
RoboCat: A Self-Improving Generalist Agent for Robotic ManipulationTMLR 20232023-06-20Star Github
Data Selection
What Matters in Learning from Large-Scale Datasets for Robot ManipulationICLR 20252025-01-23Project
AMF: Active Fine-Tuning of Generalist PoliciesarXiv2024-10-07-
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation LearningCoRL 20242024-08-26Star Github
An Unbiased Look at Datasets for Visuo-Motor Pre-TrainingCoRL 20232023-10-13Star Github
Data Quality in Imitation LearningNeurIPS 20232023-06-04-
Data Retrieval
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy LearningICLR 20252024-12-19Project
Retrieval-Augmented Embodied AgentsCVPR 20242024-04-17-
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled DatasetsRSS 20232023-04-08Star Github
Data Augmentation
RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from DemonstrationsarXiv2024-11-25Project
RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot LearningCoRL 20242024-09-05Project
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer LearningCoLLAs 20242024-07-30Project
Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation LearningRSS 20242023-02-27Star Github
ROSIE: Scaling Robot Learning with Semantically Imagined ExperienceRSS 20232023-02-22Project
GenAug: Retargeting behaviors to unseen situations via Generative AugmentationRSS 20232023-02-13Star Github
Evaluation
Contrast Sets for Evaluating Language-Guided Robot PoliciesCoRL 20242024-06-19-

(back to top)

Affordance Learning

TitleVenueDateCode
Articulated Object Affordance
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?arXiv2024-12-13-
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language ModelsICRA 20252024-09-16Project
A3VLM: Actionable Articulation-Aware Vision Language ModelCoRL 20242024-06-14Star Github
AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic ManipulationCoRL 20242024-06-17Project
SAGE: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated ObjectsRSS 20242023-12-03Star Github
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMsICRA 20242023-11-06Star Github
Ditto: Building Digital Twins of Articulated Objects from InteractionCVPR 20222022-08-16Star Github
Part-Based Object Affordance
3DAPNet: Language-Conditioned Affordance-Pose Detection in 3D Point CloudsICRA 20242023-09-19Star Github
CPM: Composable Part-Based ManipulationCoRL 20232024-05-09Project
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud ObservationsCVPR 20232023-03-29Star Github
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable PartsCVPR 20232022-11-10Star Github
Spatial Affordance
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for RoboticsarXiv2024-11-25Project
SpatialBot: Precise Spatial Understanding with Vision Language ModelsICRA 20252024-06-19Star Github
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for RoboticsCoRL 20242024-06-15Star Github
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning CapabilitiesCVPR 20242024-01-22Project
Visual Affordance
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic ManipulationCoRL 20242024-07-05Star Github
MOKA: Open-World Robotic Manipulation through Mark-Based Visual PromptingRSS 20242024-03-05Star Github
SLAP: Spatial-Language Attention PoliciesCoRL 20232023-04-21Star Github
KITE: Keypoint-Conditioned Policies for Semantic ManipulationCoRL 20232023-06-29Project
HULC++: Grounding Language with Visual Affordances over Unstructured DataICRA 20232022-10-04Star Github
CLIPort: What and Where Pathways for Robotic ManipulationCoRL 20222021-09-24Star Github
VAPO: Affordance Learning from Play for Sample-Efficient Policy LearningICRA 20222022-03-01Project
Transporter Networks: Rearranging the Visual World for Robotic ManipulationCoRL 20202020-10-27Star Github

(back to top)

3D Representation for Manipulation

(back to top)

3D Representation Policy Learning

TitleVenueDateCode
Diffusion Policy (DP)
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D ManipulationICLR 20252024-09-30Project
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCoRL 20242024-02-16Star Github
DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D RepresentationsRSS 20242024-03-06Star Github
Reconstruction
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic ManipulationarXiv2024-11-27Star Github
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic ManipulationECCV 20242024-03-13Star Github
SGRv2: Leveraging Locality to Boost Sample Efficiency in Robotic ManipulationCoRL 20242024-06-15Star Github
RVT-2: Learning Precise Manipulation from Few DemonstrationsRSS 20242024-01-12Star Github
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature FieldsCoRL 20232023-08-31Star Github
3D4RL: Visual Reinforcement Learning with Self-Supervised 3D RepresentationsRA-L 20232022-10-13Star Github
PolarNet: 3D Point Clouds for Language-Guided Robotic ManipulationCoRL 20232023-09-27Star Github
M2T2: Multi-Task Masked Transformer for Object-centric Pick and PlaceCoRL 20232023-11-02Star Github
PerAct: Perceiver-Actor: A Multi-Task Transformer for Robotic ManipulationCoRL 20222022-09-12Star Github
Visual Goal Generation
3D-MVP: 3D Multiview Pretraining for Robotic ManipulationCoRL 20242024-06-26Project
ActAIM2: Discovering Robotic Interaction Modes with Discrete Representation LearningCoRL 20242024-10-26Project
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied ManipulationICML 20242024-05-30Star Github
RVT: Robotic View Transformer for 3D Object ManipulationCoRL 20232023-06-26Star Github
GROOT: Learning Generalizable Manipulation Policies with Object-Centric 3D RepresentationsCoRL 20232023-10-22Star Github
others
SPHINX: What's the Move? Hybrid Imitation Learning via Salient PointsICLR 20252024-12-06Star Github
SGR: A Universalc Semantic-Geometric Representation for Robotic ManipulationCoRL 20232023-06-18Star Github

(back to top)

Reasoning, Planning and Code Generation

TitleVenueDateCode
Task Planning
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics ManipulationarXiv2024-11-26Project
Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction FollowingarXiv2024-04-21-
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language ModelsIROS 20242024-08-15Project
PG-InstructBLIP: Physically Grounded Vision-Language Models for Robotic ManipulationICRA 20242023-09-05Project
RoCo: Dialectic Multi-Robot Collaboration with Large Language ModelsICRA 20242023-07-10Star Github
REFLECT: Summarizing Robot Experiences for Failure Explanation and CorrectionCoRL 20232023-06-27Star Github
Saycan: Do As I Can, Not As I Say: Grounding Language in Robotic AffordancesCoRL 20232022-04-04Star Github
LLM+P: Empowering Large Language Models with Optimal Planning ProficiencyarXiv2023-04-22Star Github
Inner Monologue: Embodied Reasoning through Planning with Language ModelsCoRL 20222022-07-12Project
SHOWTELL: Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual DemonstrationsCoRL 20242024-09-06Project
GIRAF: Gesture-Informed Robot Assistance via Foundation ModelsCoRL 20232023-09-06Project
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language ModelsICCV 20232022-12-08Star Github
Code Generation
Robotic Programmer: Video Instructed Policy Code Generation for Robotic ManipulationarXiv2025-01-08Project
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-ThoughtNeurIPS 20232023-05-26Project
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language ModelarXiv2023-05-18Star Github
ProgPrompt: Generating Situated Robot Task Plans using Large Language ModelsICRA 20232022-09-22Star Github
ChatGPT for Robotics: Design Principles and Model AbilitiesIEEE Access 20232023-02-20Star Github
Code as Policies: Language Model Programs for Embodied ControlICRA 20232022-09-16Star Github
TidyBot: Personalized Robot Assistance with Large Language ModelsAutonomous Robots 20232023-05-09Star Github
Statler: State-Maintaining Language Models for Embodied ReasoningICRA 20242023-06-30Star Github
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task PlanningRSS 20242023-05-30Star Github
Text2Motion: From Natural Language Instructions to Feasible PlansAutonomous Robots 20232023-03-21Project
Multimodal Reasoning
From Foresight to Forethought: VLM-In-the-Loop Policy Steering via Latent AlignmentarXiv2025-02-03-
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure DetectionarXiv2024-12-05Project
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic ManipulationICLR 20252024-10-01Project
Ξ»-Repformer: Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned RepresentationsCoRL 20242024-10-01Project
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic ManipulationCVPR 20242023-12-24Star Github
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeurIPS 20232023-05-24Star Github
Matcha: Chat with the Environment: Interactive Multimodal Perception Using Large Language ModelsIROS 20232023-03-14Star Github
PaLM-E: An Embodied Multimodal Language ModelICML 20232023-03-06Star Github
Socratic Models: Composing Zero-Shot Multimodal Reasoning with LanguageICLR 20232022-04-01Project

(back to top)

Generalization

TitleVenueDateCode
Generalization using Data
Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-PaintingRSS 20242024-02-29Star Github
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic ManipulationICRA 20242024-02-29Star Github
Compositional Generalization
Policy Architectures for Compositional Generalization in ControlNeurIPSW 20222022-03-10Star Github
PROGRAMPORT: Programmatically Grounded, Compositionally Generalizable Robotic ManipulationICLR 20232023-04-26Project
Efficient Data Collection for Robotic Manipulation via Compositional GeneralizationRSS 20242024-03-08Project
Sim2Real Generalization
Natural Language Can Help Bridge the Sim2Real GapRSS 20242024-05-16Star Github
RialTo: Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust ManipulationRSS 20242024-03-06Star Github
Domain Randomization: Sim-to-Real Transfer of Robotic Control with Dynamics RandomizationICRA 20182017-10-18
Generalization for Long-horizon and Complex Task
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic ManipulationarXiv2025-01-11-
ManipGen: Local Policies Enable Zero-shot Long-horizon ManipulationCoRLW 20242024-10-29Project
TBBF: A Backbone for Long-Horizon Robot Task UnderstandingRA-L 20252024-08-02Project
STAP: Sequencing Task-Agnostic PoliciesICRA 20232022-10-21Star Github
BOSS: Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model GuidanceCoRL 20232023-12-16Star Github
BLADE: Learning Compositional Behaviors from Demonstration and LanguageCoRL 20242024Project
PALO: Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot ImitationCoRL 20242024-08-29Star Github
Few-shot
You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video DemonstrationsarXiv2025-01-24Star Github
Learning Generalizable 3D Manipulation With 10 DemonstrationsarXiv2024-11-15Star Github

(back to top)

Generalist

TitleVenueDateCode
Generalist with Different Embodiment Types
CrossFormer: Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and AviationCoRL 20242024-08-21Star Github
ARIO: All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied AgentsarXiv2024-08-20Project
HPT: Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained TransformersNeurIPS 20242024-09-30Star Github
Generalist in Different Embodied Tasks
LEO: An Embodied Generalist Agent in 3D WorldICML 20242023-11-18Star Github
Manipulation Generalist
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language GroundingarXiv2025-01-08Star Github
RLDG: Robotic Generalist Policy Distillation via Reinforcement LearningarXiv2024-12-13Project
RoboMM: All-in-One Multimodal Large Model for Robotic ManipulationarXiv2024-12-10Star Github
RoboDual: Towards Synergistic, Generalized, and Efficient Dual-System for Robotic ManipulationarXiv2024-10-10Project
Effective Tuning Strategies for Generalist Robot Manipulation PoliciesarXiv2024-10-02-
Octo: An Open-Source Generalist Robot PolicyRSS 20242024-05-20Star Github
V-GPS: Steering Your Generalists: Improving Robotic Foundation Models via Value GuidanceCoRL 20242024-10-17Project
Open X-Embodiment: Robotic Learning Datasets and RT-X ModelsICRA 20242023-10-13Star Github
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action ChunkingICRA 20242023-09-05Star Github
Maniwhere: Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement LearningCoRL 20242024-07-22Project
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic ManipulationarXiv2024-10-19Project
Robot Utility Models: General Policies for Zero-Shot Deployment in New EnvironmentsarXiv2024-09-09Github
More for VLAs

(back to top)

Human-Robot Interaction and Collaboration

(back to top)

Mobile Manipulation

(back to top)

Tactile-based Manipulation

(back to top)

Dexterous Manipulation

(back to top)

Other Applications

TitleVenueDateCode
Deformable Object Manipulation
HANDLOOM: Learned Tracing of One-Dimensional Objects for Inspection and ManipulationCoRL 20232023-03-15Project
Contact-rich Manipulation
FoAR: Force-Aware Reactive Policy for Contact-Rich Robotic ManipulationarXiv2024-11-24Project
ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich ManipulationarXiv2024-10-10Project
Stowing Tasks
Predicting Object Interactions with Behavior Primitives: An Application in Stowing TasksCoRL 20232023-09-28Star Github
Object Rearrangement
PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene RearrangementWACV 20252024-10-29-
LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object RearrangementIROS 20242023-09-27Star Github
LLM-GROP: Task and Motion Planning with Large Language Models for Object RearrangementIROS 20232023-03-10Colab
DALL-E-Bot: Introducing Web-Scale Diffusion Models to RoboticsRA-L 20232022-10-05Project
Human-to-Robot Handover
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and ImitationCVPR 20242024-01-01Star Github
Cook
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse ToolsCoRL 20232023-06-26Star Github
Non-prehensile Manipulation
HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile ManipulationCoRL 20232023-05-06Star Github
Feed
VAPORS: Learning Sequential Acquisition Policies for Robot-Assisted FeedingCoRL 20232023-09-11Project
Tool Manipulation
Leveraging Language for Accelerated Learning of Tool ManipulationCoRL 20232022-06-27Star Github
Responsible Manipulation
How vulnerable is my policy? Adversarial attacks on modern behavior cloning policiesarXiv2025-02-06-
Don't Let Your Robot be Harmful: Responsible Robotic ManipulationarXiv2024-11-27Star Github
TrojanRobot: Backdoor Attacks Against LLM-based Embodied Robots in the Physical WorldarXiv2024-11-18Project

(back to top)

πŸ“Š Awesome Benchmarks

Grasp Datasets

(back to top)

Manipulation Benchmarks

TitleVenueDateCode
Manipulation in Home Environment
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist RobotsRSS 20242024-06-04Star Github
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D ScenesICCV 20232023-04-09Star Github
HomeRobot: Open-Vocabulary Mobile ManipulationCoRL 20232023-06-20Star Github
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday TasksCVPR 20202019-12-03Star Github
Manipulation in On-Table Environment
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning TasksarXiv2024-12-24Star Github
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyICRA 20252024-10-02Star Github
OBSBench: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningNeuIPS 20242024-02-04Star Github
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMsCoRL 20242024-10-04Star Github
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic ManipulationRSS 20242024-02-13Star Github
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningNeurIPS 20232023-06-05Star Github
VIMA: General Robot Manipulation with Multimodal PromptsICML 20232022-10-06Star Github
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation TasksRA-L 20212021-12-06Star Github
RLBench: The Robot Learning Benchmark & Learning EnvironmentRA-L 20202019-09-26Star Github
KitchenShift: Evaluating Zero-Shot Generalization of Imitation-Based Policy Learning Under Domain ShiftsNeurIPSW 20212021Star Github
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement LearningCoRL 20192019-10-24Star Github
Franka-Kitchen: Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement LearningCoRL 20192019-10-25Project
Evaluating Real-World Robot Manipulation Policies in SimulationCoRL 20242024-05-09Star Github
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic ManipulationarXiv2024-10-07-
ClutterGen: A Cluttered Scene Generator for Robot LearningCoRL 20242024-07-07Star Github
Tactile Manipulation
Efficient Tactile Simulation with Differentiability for Robotic ManipulationCoRL 20222022Star Github
Functional Manipulation
FMB: a Functional Manipulation Benchmark for Generalizable Robotic LearningIJRR 20242024-01-16Star Github
Robot Trajectory Datasets
Open X-Embodiment: Robotic Learning Datasets and RT-X ModelsICRA 20242023-10-13Star Github
DROID: A Large-Scale In-The-Wild Robot Manipulation DatasetICRA 20242024-03-19Project
BridgeData V2: A Dataset for Robot Learning at ScaleCoRL 20232024-08-24Star Github
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-ShotRSSW 20232023-07-02Project
Embodied QA Datasets
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language ModelsIROS 20242024-03-17Star Github
OpenEQA: Embodied Question Answering in the Era of Foundation ModelsCVPR 20242024Star Github

(back to top)

Cross-Embodiment Benchmarks

(back to top)

πŸ› οΈ Awesome Techniques

TitleVenueDateCode
Awesome-Implicit-NeRF-Robotics: Neural Fields in Robotics: A Survey-2024-10-26Star Github
Awesome-Video-Robotic-Papers-2024Star Github
Awesome-Generalist-Robots-via-Foundation-Models: Neural Fields in Robotics: A Survey-2024Star Github
Awesome-Robotics-3D-2024Star Github
Awesome-Robotics-Foundation-Models: Foundation Models in Robotics: Applications, Challenges, and the Future-2023-12-13Star Github
Awesome-LLM-Robotics-2022Star Github

(back to top)

✨ Citation

If you find this repository useful, please consider citing this list:

@misc{bai2024roboticsmanipulation,
    title = {Awesome-Robotics-Manipulation},
    author = {Bai, Shuanghao and Ding, Pengxiang and Zhang, Haoran},
    journal = {GitHub repository},
    url = {https://github.com/BaiShuanghao/Awesome-Robotics-Manipulation},
    year = {2024},
}