Joint Control vs End Effector Control
which control mode should we use for manipulation?
Created on June 11|Last edited on June 14
Comment
Introduction
A short experiment I ran to compare the two different control modes. The policy is RL (specifically PPO) trained on the ManiSkill PickCube-v1 environment (Franka robot). These two control modes are the most common for manipulation in robotics. End Effector Control (ee control) is when the model outputs motion relative to the position of the gripper/hand, intuitively things like up/down/left/right/etc. Joint Control (q control or joint control) is when the model outputs the desired joint angles for the robot arm directly. Either control mode can also operate in `delta` mode, which means that the model outputs a small delta wrt the current state as opposed to some absolute value. Joint delta control is also known as velocity control. There is also a long tail of more exotic control modes such as impedance control which uses forces/torques.

Teleop using a desired pose (i.e. a VR controller location) uses ee control. Teleop using a mimic/slave arm (i.e. aloha) uses joint control. Theoretically, ee control is "easier" to learn, but also requires IK in the loop, since you need to convert the desired ee pose into actual joint angles to send to the robot motors. Joint control is more direct, but requires the model to learn the IK implicitly. In practice, ee control is more common: you can scroll through this config file describing robots used in the Open X-Embodiment Dataset and see the different observation (called proprio in this repo) and action spaces (aka control mode) for various manipulation projects in the past couple years. EE control also is more embodiment agnostic: two different robot arms, say one 6dof and one 7dof, can have the same ee action space.

Depending on the project, K-Scale has used different control modes. The VR teleop was ee control. The mocap walking was a kind of ee control. The isaac gym walking was joint control. In a hypothetical future, I think joint control will win, since it simplifies the control problem and liberates us from the dependencies of IK solvers. The biggest problems we will need to solve on this path is how to collect data for this control mode. The experiment below is to compare control modes in the ManiSkill PickCube-v1 envrionment using PPO policies. Thanks for reading, I realize that for those of you with robotics experience this is all review, my hope is that it helps those who are new and starts a larger conversation about what control modes we should be using in our projects.
Experiments
As you can see this is a relatively simple task, so both control modes can quickly get to a 100% eval success rate. I kept other hyperparameters the same to make direct comparison easy. I also ablated solving the task using vision (the _rgb in these means that a camera RGB image was used as the observation space). You can see how vision makes the task harder to solve. Here are the exact commands to run these:
export NUM_TRAIN_STEPS=10_000_000python ppo_rgb.py --env_id="PickCube-v1" \--control_mode="pd_joint_delta_pos" \--num_envs=128 --update_epochs=8 --num_minibatches=8 \--total_timesteps=$NUM_TRAIN_STEPSpython ppo_rgb.py --env_id="PickCube-v1" \--control_mode="pd_ee_delta_pose" \--num_envs=128 --update_epochs=8 --num_minibatches=8 \--total_timesteps=$NUM_TRAIN_STEPSpython ppo.py --env_id="PickCube-v1" \--control_mode="pd_joint_delta_pos" \--num_envs=1024 --update_epochs=8 --num_minibatches=32 \--total_timesteps=$NUM_TRAIN_STEPSpython ppo.py --env_id="PickCube-v1" \--control_mode="pd_ee_delta_pose" \--num_envs=1024 --update_epochs=8 --num_minibatches=32 \--total_timesteps=$NUM_TRAIN_STEPS
If you want to run these yourself, use the kscalelabs/ManiSkill fork. These experiments run rather quickly thanks to the vectorized environments in ManiSkill, which allow collection of experience in parallel, greatly speeding up PPO.
Add a comment