DexCtrl: Towards Sim-to-Real Dexterity with Adaptive Controller Learning

Shuqi Zhao, Ke Yang, Yuxin Chen, Chenran Li, Yichen Xie,
^†Xiang Zhang, ^†Changhao Wang, ^†Masayoshi Tomizuka
Department of Mechanical Engineering
University of California Berkeley, United States

Abstract

Dexterous manipulation has seen remarkable progress in recent years, with policies capable of executing many complex and contact-rich tasks in simulation. However, transferring these policies from simulation to real world remains a significant challenge. One important issue is the mismatch in low-level controller dynamics, where identical trajectories can lead to vastly different contact forces and behaviors when control parameters vary. Existing approaches often rely on manual tuning or controller randomization, which can be labor-intensive, task-specific, and introduce significant training difficulty. In this work, we propose a framework that jointly learns actions and controller parameters based on the historical information of both trajectory and controller. This adaptive controller adjustment mechanism allows the policy to automatically tune control parameters during execution, thereby mitigating the sim-to-real gap without extensive manual tuning or excessive randomization. Moreover, by explicitly providing controller parameters as part of the observation, our approach facilitates better reasoning over force interactions and improves robustness in real-world scenarios. Experimental results demonstrate that our method achieves improved transfer performance across a variety of dexterous tasks involving variable force conditions.

Refer to caption — Figure 1: Compared to previous work with only action prediction (upper left), DexCtrl (lower left) jointly predicts both action and control parameters, significantly reducing human labor for tuning and achieving better performance on two contact-rich manipulation tasks: rotation and flipping.

Keywords: Dexterous Manipulation, Sim-to-real Transfer, Adaptive Control, Reinforcement Learning

1 Introduction

Dexterity is a core component of human manipulation and has long posed a significant challenge in robotics research. Beyond their strong performance in grasping [1, 2, 3, 4], dexterous manipulation policies has shown capabilities in handling various contact-rich manipulation tasks such as rotating objects [5, 6, 7, 8, 9], playing the piano [10, 11], and using various tools [12, 13]. Despite these progress, transferring dexterous policies from simulation to the real world still remains a critical challenge. Currently, many manipulation policies have addressed the sim-to-real gap from various aspects, such as introducing random noise to the observed proprioceptive information or applying random forces to objects to enhance output trajectory robustness.

However, we here identify that one important issue has received relatively little attention in previous work: the discrepancies between robot controllers in simulation and the real world. Since the final command sent to the robot is the motor torque computed from both the trajectory and control parameters, failing to explicitly consider robot controller gap still results in a discrepancy between simulated and real-world performance. Nowadays, people try to bridge this gap by manually tuning controller parameters, which compares the robot’s trajectory outputs between simulation and the real world to match the final performance [5, 14]. Additionally, mild randomization of controller parameters during training has also been widely used to enhance policy robustness against sim-to-real discrepancies [7, 12]. However, manual tuning and randomization can both ultimately lead to task failure in real-world deployments: randomization leads to a substantial increase in training difficulty, and manually tuning control parameters often fails to achieve the necessary precision for successful task execution. Moreover, both of them require extensive effort in adjusting controller parameters and tuning randomization hyperparameters, significantly increasing human labor. Fundamentally, current solutions to this problem have relied merely on extensive attempts based on prior human experiences, rather than a truly principled and automatic approach.

In our work, as shown in Figure 1, we propose a novel method, DexCtrl, that adaptively adjusts the controller parameters based on historical information, thus narrowing the sim-to-real gap. Concretely, DexCtrl model learns to output both actions and controller parameters for each time step based on previous desired and actual joint trajectories as well as the corresponding controller parameters within a time window. By doing so, DexCtrl can automatically adjust the controller behavior in a close-loop manner, bypassing the complicated procedure of manually tuning parameters, with the assurance of strong adaptability to real-world scenarios. Besides, it alleviates the policy exploration difficulty introduced by randomization because DexCtrl directly obtains controller information in observation, leading to better capture of force information. Overall, the contributions of our work are as follows:

•

We identify the mismatch of robot controllers as a critical factor in the sim-to-real gap and propose a novel method to adjust the control parameters adaptively.
•

We design a simple and elegant framework to jointly obtain actions and controller parameters based on historical information, which can offer better adaptivity to force variation.
•

Extensive experiments on two different tasks show our method can significantly outperform baselines in both simulation and the real world, along with thorough analysis.

2 Problem Statements

Manipulation Tasks Our method primarily focuses on enhancing the sim-to-real performance of dexterous manipulation tasks by bridging the control gap. To validate our approach, we implement two challenging dexterous manipulation tasks that involve contact between objects, hands, and environments: in-hand object rotation and flipping. The goal of the in-hand rotation task is to rotate an object using the fingertips along a specific axis without dropping it, while the goal of the flipping task is to flip an object on a table along a designated axis. Both tasks exemplify the contact-rich behaviors characteristic of dexterous manipulation, where the sim-to-real gap significantly impacts real-world performance.

Robot Controller Our method is primarily designed for policies operating under joint torque control. We use the LEAP hand as an example, which has 16 degrees of freedom (DOF). The torque controller takes in the robot hand desired joint trajectories $[q^{d},\dot{q}^{d}]\in\mathbb{R}^{32}$ , and robot current trajectories $[q^{c},\dot{q}^{c}]\in\mathbb{R}^{32}$ , and computes the corresponding joint torque $\tau\in\mathbb{R}^{16}$ as follows:

\tau=K_{P}(q^{d}-q^{c})+K_{D}(\dot{q}^{d}-\dot{q}^{c})

(1)

We assume $K_{P}$ , $K_{D}$ are diagonal for simplicity, and define $K=\{K_{P},K_{D}\}\in\mathbb{R}^{32}$ as the collection of controller parameters, representing the robot stiffness and damping matrices, respectively. As shown in Eq. 1, the torque output is directly modulated by the choice of $K$ , which necessitates careful tuning of these parameters. In particular, besides control parameters $K$ that directly determine the actual torque values, increasing stiffness $K_{P}$ reduces steady-state error but may induce oscillations, while increasing damping $K_{D}$ suppresses overshoot but can amplify high-frequency noise from a dynamic perspective. In policies without adaptive control mechanisms, the controller parameters are fixed, and only the desired trajectory $q^{d}_{t}$ is predicted by the policy. Different from previous work, DexCtrl also predicts controller parameters $K_{t}$ along with $q^{d}_{t}$ at each time step, enabling simultaneous controller adjustment at each time step to meet force requirements and ensure smooth trajectories. Desired velocities $\dot{q}^{d}$ are set as zero in both our method and previous work.

3 Methods

Overall framework of our method is shown in Figure 2. During training (Figure 2 a)), we first collect sufficient data using an oracle policy trained in simulation with diverse object physical parameters. Then we distill two separate models to predict desired action and control parameters, respectively, based on historical information extracted from the collected dataset. During inference (Figure 2 b)), we recursively predict desired actions for the next step given current and historical observations, and then predict control parameters for the next step based on the generated desired action. By doing so, our method provides two main advantages. First, we include control parameters and actions as part of the observation, which better handles force information and narrows the sim-to-real gap. Second, our method explicitly predicts control parameters with a separate module, which not only adaptively adjusts parameters to improve task performance but also reduces overall training difficulty.

3.1 Oracle Policy for Data Collection

We utilize model-free Proximal Policy Optimization (PPO) reinforcement learning to obtain oracle policies in simulation. Specifically, for each time step $t$ , oracle policy $\pi(a_{t},K_{t}\mid s_{t})$ takes in state $s_{t}$ , outputs joint action $a_{t}$ and $K_{t}$ simultaneously. The action $a_{t}$ is executed using a controller with parameters $K_{t}$ . The desired joint trajectories at $t$ time step are obtained from $q_{t}^{d}=q^{d}_{t-1}+a_{t}$ . The detailed task designs for two contact-rich manipulation tasks are described as follows:

State: States of two tasks $s_{t}\in\mathbb{R}^{219}$ contain observation of object and robot over the last three time steps. Robot information $s^{r}_{t}\in\mathbb{R}^{64}$ for each step includes current joint positions $q^{c}_{t}$ , desired joint positions $q^{d}_{t}$ and controller parameters $K_{t}$ . Object information $s^{obj}_{t}\in\mathbb{R}^{9}$ contains object pose $p^{obj}_{t}\in\mathbb{R}^{6}$ and object property vector $\mu\in\mathbb{R}^{3}$ , including scale, mass and friction. In short, the state can be presented as: $s_{t}\triangleq(s^{r}_{t-2:t},s^{obj}_{t-2:t}),s^{r}_{t}\triangleq(q^{c}_{t},q^{d}_{t},K_{t}),\ s^{obj}_{t}\triangleq(p^{obj}_{t},\mu)$

Reward Reward of two tasks $r_{t}$ mainly contains four parts:

r_{t}=r_{rotation}+r_{contact}+r_{smoothness}+r_{terminate}

(2)

The rotation speed reward $r_{rotation}$ encourages the object to rotate faster along a certain axis until it reaches the targeted maximum speed. The contact reward $r_{contact}$ encourages binary contacts between the object and the fingertips. The smoothness reward $r_{smoothness}$ penalizes sudden changes of robot joint positions and torques. The terminal reward $r_{terminate}$ penalizes objects when falling off the fingertips (Rotation) and moving too far from the initial positions (Flipping). Details of reward parameters can be found in the appendix.

3.2 Action Prediction and Control Parameters Prediction

We separately train our student policy into two modules, i.e. an action prediction module for trajectory generation and a control parameter prediction module for adapting control parameters, as not only do they encode fundamentally different aspects of the task, but also we want to prevent control parameters prediction from affecting action prediction.

Historical Information for Distillation Though feasible for task completion, oracle policies cannot be directly transferred to the real world because some primitive information such as object information is not directly accessible in real-world settings. To solve this problem, our method utilizes historical states of robot proprioception to distill primitive information used in oracle policy, enabling the estimation of rotated object properties [6]. Concretely, we use the last ten steps of the current and desired joint trajectories, along with corresponding controller parameters as historical information for policy input.

Module Design To better leverage historical information, we use self-attention to model temporal historical input for action prediction. For control parameter prediction, we use cross-attention where current action serves as query and historical input serves as key and value, modeling the relationship between the current action and historical input. Although the historical input formulations of the two modules remain the same, their meaning is completely different. In action prediction, the input mainly indicates the trend of joint trajectory variation, similar to the previous work [7, 8]. In control parameter prediction, it mainly indicates an approximate relation between joint actions and control parameters, which can be analogized to how humans infer current control parameters decisions based on historical trajectories and previous parameters.

Training and Inference Training processes of the two modules are performed in an open-loop manner, meaning all input data is directly retrieved from the collected simulation dataset. However, during both simulation and real-world inference, our method performs closed-loop behaviors, meaning values of current trajectories are obtained from actual robot sensors. We linearly map the control parameters from simulation to their corresponding values on the real-world system, with only an approximate estimate of upper and lower bounds instead of careful calculation and tuning. Furthermore, we find that adding Gaussian noise to the current trajectory values during student policy training is enough for sim-to-real transfer, even though the training dataset is not collected by a teacher policy with input noise randomization. This finding indicates that certain randomization on observation can be reduced to ease the teacher policy training procedure.

4 Experiments

We conducted experiments in both simulated and real-robot environments to evaluate our proposed method. Specifically, we present results on contact-rich dexterous manipulation tasks, examining two key aspects: in-hand rotation, which emphasizes object-hand interaction, and object flipping, which highlights environment contact. Our investigation primarily addresses key questions through in-hand rotation experiments, with in-hand flipping serving as a supporting task to provide additional insights: Q1: Does our method improve the original oracle performance? Q2: Does our method narrow the sim-to-real gap? Q3: How does our method perform across objects with varying physical parameters? Q4: How do changes in controller parameters impact the results?

4.1 Experimental Setup

Baselines We compare our method with two main baselines:

•

Manual Tuning Similar to [5, 7], this baseline trains a new oracle policy with (1) carefully tuned control parameters based on trajectory output comparison between simulation and real world (2) a small range of randomization added on the controller, and then use this new oracle policy for student policy training. Both the new oracle policy and student policy would not output adaptive controller parameters.
•

Ours w/o PD This baseline replaces the adaptive controller parameter prediction module in the student policy with fixed controller parameters same as manual tuning, and only trains the action prediction module using the dataset collected by our oracle policy.

Metrics We quantitatively evaluate the performance of all methods with four metrics [6, 7]:

•

Rotation Reward/Radians (RotR) This metric represents the rotation speed of the targeted object around the desired axis. In simulation, it is calculated as the average rotation reward over a trajectory. In the real world, it is calculated by the net rotation of object in radians over a trajectory.
•

Time To Fail (TTF) This metric represents the average trajectory length before the object falls off the hand or moves too far from its initial positions in rotation and flipping tasks, respectively.
•

Object linear Velocity (ObjVel) This metric represents the average magnitude of object linear velocity per action step. It reflects the stability of the targeted object. This metric is only calculated in the simulation because real-world object velocity can not be easily measured in rotation. It is only evaluated in rotation because such behavior is inevitable in flipping.
•

Torque Penalty (Torque) This metric computes the torque penalty reward per time step to measure the energy cost and it is only measured in simulation.

4.2 Does our Method Improve the Original Oracle Performance?

Table 1: Rotation oracle policy in simulation.

	Method	RotR $\uparrow$	TTF $\uparrow$	Torque $\downarrow$	ObjVel $\downarrow$
With Disturbance	Manual Tuning	35.05	239.4	0.398	0.154
	Ours w/o PD	41.60	252.3	0.152	0.140
	Ours	43.51	255.9	0.099	0.144
Without Disturbance	Manual Tuning	37.64	247.5	0.264	0.148
	Ours w/o PD	47.87	275.8	0.144	0.131
	Ours	52.33	287.7	0.092	0.132

Table 2: Flipping oracle policy in simulation.

	Method	RotR $\uparrow$	TTF $\uparrow$	Torque $\downarrow$
With Disturbance	Manual Tuning	91.07	295.2	0.376
	Ours w/o PD	82.23	295.6	0.299
	Ours	172.50	296.9	0.140
Without Disturbance	Manual Tuning	92.24	295.0	0.446
	Ours w/o PD	82.90	295.4	0.328
	Ours	184.00	296.9	0.127

We first test DexCtrl in a simulation environment with and without randomly applying force disturbances on objects. During validation, we randomize 1024 different initial robot poses and set the simulation controllers exactly the same as those used during training, meaning no controller gap is involved. This experiment aims to show whether performance can be improved by simultaneously adjusting action and control parameters even under the same controllers. Table 2 and Table 2 present the quantitative results of baselines and DexCtrl in simulation validation. Compared to the manual tuning baseline, DexCtrl improves performance significantly, especially in RotR and TTF. This demonstrates that our method can effectively stabilize and accelerate the task process with or without disturbance. It’s worth noticing that Ours w/o PD also reaches relatively good performance, indicating the robustness of trajectories generated by our method. However, in the flipping tasks with results in Table 2, Ours w/o PD does not outperform the baseline. We believe it is because flipping involves rich contact between both the floor and the dexterous hands, making it more sensitive to controller parameters’ variance. In conclusion, our method outperforms the baseline in both task performance and training speed, even in the absence of the controller gap.

Table 3: Quantitative results of rotation performance for objects with different masses and frictions.

		Cube(94g)	Bottle(150g)	Apple(221g)	Yogurt(164g)	Baseball(144g)	Average
Manual Tuning	RotR $\uparrow$	1.963	2.875	1.914	2.943	2.745	2.431
Manual Tuning	TTF $\uparrow$	286.5	266.9	242.7	300	239.6	272.4
Ours w/o PD	RotR $\uparrow$	5.498	4.285	4.492	5.424	4.681	4.986
Ours w/o PD	TTF $\uparrow$	297.7	281.7	291.6	300	271.6	287.2
Ours	RotR $\uparrow$	9.386	14.006	9.676	15.017	11.342	11.041
Ours	TTF $\uparrow$	289.3	300	300	300	292.2	292.6

4.3 Does our Method Narrow the Sim-to-real Gap?

We directly deploy DexCtrl in real-world scenarios. For the in-hand rotation task, we use twelve different real-world unseen objects with varying masses and frictions, and validate rotation performance based on the average metrics over ten randomly sampled initial robot poses per object. Table 3 and Figure 4 present quantitative results and visualizations of DexCtrl along with baselines across different objects, respectively. Compared to the simulation results, the performance gap among different methods is more pronounced in the real world, where DexCtrl significantly outperforms the baseline under zero-shot sim-to-real transfer. Also, Ours w/o PD achieves relatively strong performance, demonstrating that trajectories generated by DexCtrl can be more robust when transferred to real-world scenarios. It’s worth noticing that the performance gap between DexCtrl and Ours w/o PD is much larger in the real world than in simulation. This finding highlights the necessity of adaptively adjusting control parameters at every step in real-world robots, which further emphasizes the importance of adaptive controller prediction in tackling sim-to-real issues. We also conduct real-world experiments on the flipping task with visualizations shown in Figure 3, demonstrating the real-world task generalizability of our method.

Table 4: Quantitative results for same-shape objects rotation with different masses and frictions.

Methods	Metrics	Mass			Friction
Methods	Metrics	Light	Medium	Heavy	Small	Medium	Large
Manual Tuning	RotR $\uparrow$	2.042	1.963	0.628	1.963	1.374	2.231
Manual Tuning	TTF $\uparrow$	286.5	286.5	243	286.5	239.9	288
Ours w/o PD	RotR $\uparrow$	4.998	5.498	2.356	5.498	4.355	4.855
Ours w/o PD	TTF $\uparrow$	298.8	297.7	264.9	297.7	278.6	286.6
Ours	RotR $\uparrow$	10.414	9.386	5.998	9.386	10.211	10.681
Ours	TTF $\uparrow$	258.2	289.3	300	289.3	300	290.4

4.4 How does our Method Perform across Objects with Varying Physical Parameters?

To further evaluate our policy, we perform additional tests on objects with different masses and friction coefficients. To ensure better control over confounding factors, we use a hollow cube for mass experiments and vary its mass by inserting different internal objects, and use cubes of different textures with the same mass for friction experiments. As shown in Table 4 and Figure 5, our method significantly outperforms both baselines, especially on heavy objects, indicating that it can better adapt to objects with different physical parameters. Also, the pattern of our method broadly aligns with our force-based predictions, namely, the lightest and smoothest objects exhibit the highest speed and lowest stability, respectively.

4.5 How do Changes in Controller Parameters Impact the Results?

In this section, we investigate the fundamental question: how do the learned controller parameters affect dexterous manipulation performance? To simplify the analysis, we focus solely on the pattern of learned stiffness, as it contributes more than damping to the task performance. Theoretically, changes in stiffness directly influence the resulting joint torques, thereby modulating the contact forces between the robot and the manipulated object. Given that objects with different physical properties (e.g., mass and friction) require distinct contact force profiles, we hypothesize that variations in stiffness are closely related to object-specific configurations, aiming to provide better adaptation to varying force requirements.

To validate this hypothesis, we first analyze data collected in the simulation. Figure 6 illustrates the relationship between the average stiffness observed across trajectories and object mass as well as surface friction. The results reveal that stiffness increases monotonically with mass (left of Figure 6), consistent with the intuitive physical principle that heavier objects require greater force for manipulation. However, the relationship between stiffness and friction is more nuanced: in some cases, stiffness increases with friction (middle of Figure 6), while in others it decreases (right of Figure 6). We attribute this inconsistency to task-dependent dynamics. For instance, in rotation tasks, friction may primarily resist angular motion at certain joints, while at others it may act more like a supporting or pushing force. This suggests that the role of friction—and consequently the required stiffness—depends on both the object’s properties and the specific joint-task interaction.

To further investigate this relationship in real-world settings, we isolate the controller parameters prediction module and provide ground-truth actions to only predict controller parameters. This allows us to observe the influence of object properties without the confounding effects of policy action noise. Figure 7 presents the trends across objects with varying mass and friction. Two consistent patterns emerge: (1) for heavier objects, stiffness tends to increase at specific time steps or remain at its maximum value for longer durations; (2) for smoother objects, stiffness exhibits similar patterns at some joints but opposite trends at others, again indicating task- and joint-specific behavior.

In summary, our results show that the learned stiffness adapts systematically to object mass and friction, validating our hypothesis that controller parameters encode variations in required contact forces and thereby enhance manipulation performance through better adjustment to force requirements.

5 Related Work

Sim-to-real Transfer in Dexterous Manipulation Dexterous manipulation tasks typically involve complex interactions between robots and objects through contact [15, 16, 17, 18, 19]. Simulation has proven to be an effective way to learn these behaviors [20, 21, 4, 22], as teleoperation [23, 24, 25, 26, 27, 28] is often not feasible due to the embodiment gap and the delicate nature of the tasks. However, the sim-to-real gap remains a significant challenge [20, 29]. To bridge this gap, various approaches have been explored, including system identification, policy fine-tuning, and domain randomization [8, 30]. However, previous methods have largely overlooked the sim-to-real controller gap, while our work focuses on narrowing this sim-to-real controller-level gap and significantly improves task performance.

Learning Adaptive Force Control Learning adaptive force control has been shown to be beneficial for contact-rich manipulation tasks, as varying control parameters can regulate the robot’s behavior during interaction. Several works in the literature demonstrate its effectiveness for various contact-rich tasks, such as robotic table wiping [31, 32], object pivoting [33, 34], and assembly [35, 34, 36]. However, this approach has received relatively little attention in the context of dexterous manipulation, and the question of whether and how such a method influences dexterous hand manipulation has not been specifically answered. In our work, we apply the idea of adaptive control to contact-rich dexterous manipulation and prove its efficiency in performance improvement with ample quantitative results, visualization, and discussions.

6 Conclusion

In this work, we address the challenge of narrowing the sim-to-real gap in dexterous manipulation by applying adaptive control parameters, and propose a novel method that jointly outputs actions and control parameters based on historical information. To validate our approach, we conduct comprehensive experiments in both simulation and real-world scenarios, and analyze how control parameters contribute to performance improvements through in-hand rotation tasks. We also apply our method to the flipping task, demonstrating its generalizability across dexterous manipulation settings. In the future, we plan to expand the applicability of our policy so that multiple dexterous tasks can share a single control parameters prediction module. Additionally, if supported by hardware, another promising direction is to perform online fine-tuning based on real-time force feedback.

7 Limitations

Limitations of our method arise primarily in two aspects. First, our approach currently does not incorporate real-world force or tactile sensing due to hardware limitations. As a result, it relies solely on proprioceptive information, which may not be sufficient to fully recover the system state, particularly in contact-rich scenarios. Future work could explore integrating our method with high-fidelity force feedback to further improve real-world performance. Second, the real-world evaluation is limited to the LeapHand platform, constrained by the available hardware. In future work, we aim to extend our method to other dexterous hand platforms to assess its generalizability across different robotic embodiments.

References

Zhang et al. [2024] J. Zhang, H. Liu, D. Li, X. Yu, H. Geng, Y. Ding, J. Chen, and H. Wang. Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. In 8th Annual Conference on Robot Learning, 2024.
Yin et al. [2025] Z.-H. Yin, C. Wang, L. Pineda, K. Bodduluri, T. Wu, P. Abbeel, and M. Mukadam. Geometric retargeting: A principled, ultrafast neural hand retargeting algorithm. arXiv preprint arXiv:2503.07541, 2025.
Ye et al. [2023] J. Ye, J. Wang, B. Huang, Y. Qin, and X. Wang. Learning continuous grasping function with a dexterous hand from human demonstrations. IEEE Robotics and Automation Letters, 8(5):2882–2889, 2023.
Yang et al. [2024] M. Yang, C. Lu, A. Church, Y. Lin, C. Ford, H. Li, E. Psomopoulou, D. A. Barton, and N. F. Lepora. Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch. arXiv preprint arXiv:2405.07391, 2024.
Chen et al. [2023] T. Chen, M. Tippur, S. Wu, V. Kumar, E. Adelson, and P. Agrawal. Visual dexterity: In-hand reorientation of novel and complex object shapes. Science Robotics, 8(84):eadc9244, 2023. doi:10.1126/scirobotics.adc9244. URL https://www.science.org/doi/abs/10.1126/scirobotics.adc9244.
Qi et al. [2023a] H. Qi, A. Kumar, R. Calandra, Y. Ma, and J. Malik. In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning, pages 1722–1732. PMLR, 2023a.
Qi et al. [2023b] H. Qi, B. Yi, S. Suresh, M. Lambeta, Y. Ma, R. Calandra, and J. Malik. General in-hand object rotation with vision and touch. In Conference on Robot Learning, pages 2549–2564. PMLR, 2023b.
Wang et al. [2024] J. Wang, Y. Yuan, H. Che, H. Qi, Y. Ma, J. Malik, and X. Wang. Lessons from learning to spin” pens”. arXiv preprint arXiv:2407.18902, 2024.
Qi et al. [2025] H. Qi, B. Yi, M. Lambeta, Y. Ma, R. Calandra, and J. Malik. From simple to complex skills: The case of in-hand object reorientation. arXiv preprint arXiv:2501.05439, 2025.
Zakka et al. [2023] K. Zakka, P. Wu, L. Smith, N. Gileadi, T. Howell, X. B. Peng, S. Singh, Y. Tassa, P. Florence, A. Zeng, et al. Robopianist: Dexterous piano playing with deep reinforcement learning. arXiv preprint arXiv:2304.04150, 2023.
Qian et al. [2024] C. Qian, J. Urain, K. Zakka, and J. Peters. Pianomime: Learning a generalist, dexterous piano player from internet demonstrations. arXiv preprint arXiv:2407.18178, 2024.
Yin et al. [2025] Z.-H. Yin, C. Wang, L. Pineda, F. Hogan, K. Bodduluri, A. Sharma, P. Lancaster, I. Prasad, M. Kalakrishnan, J. Malik, et al. Dexteritygen: Foundation controller for unprecedented dexterity. arXiv preprint arXiv:2502.04307, 2025.
Liu et al. [2025] X. Liu, J. Adalibieke, Q. Han, Y. Qin, and L. Yi. Dextrack: Towards generalizable neural tracking control for dexterous manipulation from human references. arXiv preprint arXiv:2502.09614, 2025.
Yu et al. [2024] M. Yu, B. Liang, X. Zhang, X. Zhu, L. Sun, C. Wang, S. Song, X. Li, and M. Tomizuka. In-hand following of deformable linear objects using dexterous fingers with tactile sensing. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13518–13524. IEEE, 2024.
Lin et al. [2024] T. Lin, Z.-H. Yin, H. Qi, P. Abbeel, and J. Malik. Twisting lids off with two hands. arXiv preprint arXiv:2403.02338, 2024.
Akkaya et al. [2019] I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al. Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
Andrychowicz et al. [2020] O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
Teeple et al. [2022] C. B. Teeple, B. Aktaş, M. C. Yuen, G. R. Kim, R. D. Howe, and R. J. Wood. Controlling palm-object interactions via friction for enhanced in-hand manipulation. IEEE Robotics and Automation Letters, 7(2):2258–2265, 2022.
Yin et al. [2023] Z.-H. Yin, B. Huang, Y. Qin, Q. Chen, and X. Wang. Rotating without seeing: Towards in-hand dexterity through touch. arXiv preprint arXiv:2303.10880, 2023.
Guo et al. [2024] D. Guo, Y. Xiang, S. Zhao, X. Zhu, M. Tomizuka, M. Ding, and W. Zhan. Phygrasp: generalizing robotic grasping with physics-informed large multimodal models. arXiv preprint arXiv:2402.16836, 2024.
Zhao et al. [2024] S. Zhao, X. Zhu, Y. Chen, C. Li, X. Zhang, M. Ding, and M. Tomizuka. Dexh2r: Task-oriented dexterous manipulation from human to robots. arXiv preprint arXiv:2411.04428, 2024.
Lan et al. [2023] F. Lan, S. Wang, Y. Zhang, H. Xu, O. Oseni, Z. Zhang, Y. Gao, and T. Zhang. Dexcatch: Learning to catch arbitrary objects with dexterous hands. arXiv preprint arXiv:2310.08809, 2023.
Wang et al. [2024] C. Wang, H. Shi, W. Wang, R. Zhang, L. Fei-Fei, and C. K. Liu. Dexcap: Scalable and portable mocap data collection system for dexterous manipulation. arXiv preprint arXiv:2403.07788, 2024.
Shaw et al. [2024] K. Shaw, Y. Li, J. Yang, M. K. Srirama, R. Liu, H. Xiong, R. Mendonca, and D. Pathak. Bimanual dexterity for complex tasks. arXiv preprint arXiv:2411.13677, 2024.
Fu et al. [2024] Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024.
Arunachalam et al. [2023] S. P. Arunachalam, S. Silwal, B. Evans, and L. Pinto. Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation. In 2023 ieee international conference on robotics and automation (icra), pages 5954–5961. IEEE, 2023.
Qin et al. [2022] Y. Qin, Y.-H. Wu, S. Liu, H. Jiang, R. Yang, Y. Fu, and X. Wang. Dexmv: Imitation learning for dexterous manipulation from human videos. In European Conference on Computer Vision, pages 570–587. Springer, 2022.
Zhao et al. [2024] T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid. Aloha unleashed: A simple recipe for robot dexterity. arXiv preprint arXiv:2410.13126, 2024.
Xu et al. [2023] K. Xu, S. Zhao, Z. Zhou, Z. Li, H. Pi, Y. Zhu, Y. Wang, and R. Xiong. A joint modeling of vision-language-action for target-oriented grasping in clutter. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11597–11604. IEEE, 2023.
Chen et al. [2021] X. Chen, J. Hu, C. Jin, L. Li, and L. Wang. Understanding domain randomization for sim-to-real transfer. arXiv preprint arXiv:2110.03239, 2021.
Martín-Martín et al. [2019] R. Martín-Martín, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg. Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 1010–1017. IEEE, 2019.
Wang et al. [2022] C. Wang, X. Zhang, Z. Kuang, and M. Tomizuka. Safe online gain optimization for cartesian space variable impedance control. In 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), pages 751–757. IEEE, 2022.
Buchli et al. [2011] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal. Learning variable impedance control. The International Journal of Robotics Research, 30(7):820–833, 2011.
Zhang et al. [2023] X. Zhang, C. Wang, L. Sun, Z. Wu, X. Zhu, and M. Tomizuka. Efficient sim-to-real transfer of contact-rich manipulation skills with online admittance residual learning. In Conference on Robot Learning, pages 1621–1639. PMLR, 2023.
Beltran-Hernandez et al. [2020] C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, and K. Harada. Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach. Applied Sciences, 10(19):6923, 2020.
Zhang et al. [2024] X. Zhang, M. Tomizuka, and H. Li. Bridging the sim-to-real gap with dynamic compliance tuning for industrial insertion. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 4356–4363. IEEE, 2024.