PyTorch Robot Kinematics

Parallel and differentiable forward kinematics (FK), Jacobian calculation, and damped least squares inverse kinematics (IK)
Load robot description from URDF, SDF, and MJCF formats
SDF queries batched across configurations and points via pytorch-volumetric

Installation

pip install pytorch-kinematics

For development, clone repository somewhere, then pip3 install -e . to install in editable mode.

Reference

If you use this package in your research, consider citing

@software{Zhong_PyTorch_Kinematics_2024,
author = {Zhong, Sheng and Power, Thomas and Gupta, Ashwin and Mitrano, Peter},
doi = {10.5281/zenodo.7700587},
month = feb,
title = {{PyTorch Kinematics}},
version = {v0.7.1},
year = {2024}
}

Usage

See tests for code samples; some are also shown here.

Loading Robots

import pytorch_kinematics as pk

urdf = "widowx/wx250s.urdf"
# there are multiple natural end effector links so it's not a serial chain
chain = pk.build_chain_from_urdf(open(urdf, mode="rb").read())
# visualize the frames (the string is also returned)
chain.print_tree()
"""
base_link
└── shoulder_link
    └── upper_arm_link
        └── upper_forearm_link
            └── lower_forearm_link
                └── wrist_link
                    └── gripper_link
                        └── ee_arm_link
                            ├── gripper_prop_link
                            └── gripper_bar_link
                                └── fingers_link
                                    ├── left_finger_link
                                    ├── right_finger_link
                                    └── ee_gripper_link
"""

# extract a specific serial chain such as for inverse kinematics
serial_chain = pk.SerialChain(chain, "ee_gripper_link", "base_link")
serial_chain.print_tree()
"""
base_link
└── shoulder_link
    └── upper_arm_link
        └── upper_forearm_link
            └── lower_forearm_link
                └── wrist_link
                    └── gripper_link
                        └── ee_arm_link
                            └── gripper_bar_link
                                └── fingers_link
                                    └── ee_gripper_link
"""

# you can also extract a serial chain with a different root than the original chain
serial_chain = pk.SerialChain(chain, "ee_gripper_link", "gripper_link")
serial_chain.print_tree()
"""
 gripper_link
└── ee_arm_link
    └── gripper_bar_link
        └── fingers_link
            └── ee_gripper_link
"""

Forward Kinematics (FK)

import math
import pytorch_kinematics as pk

# load robot description from URDF and specify end effector link
chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")
# prints out the (nested) tree of links
print(chain)
# prints out list of joint names
print(chain.get_joint_parameter_names())

# specify joint values (can do so in many forms)
th = [0.0, -math.pi / 4.0, 0.0, math.pi / 2.0, 0.0, math.pi / 4.0, 0.0]
# do forward kinematics and get transform objects; end_only=False gives a dictionary of transforms for all links
ret = chain.forward_kinematics(th, end_only=False)
# look up the transform for a specific link
tg = ret['lbr_iiwa_link_7']
# get transform matrix (1,4,4), then convert to separate position and unit quaternion
m = tg.get_matrix()
pos = m[:, :3, 3]
rot = pk.matrix_to_quaternion(m[:, :3, :3])

We can parallelize FK by passing in 2D joint values, and also use CUDA if available

import torch
import pytorch_kinematics as pk

d = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float64

chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")
chain = chain.to(dtype=dtype, device=d)

N = 1000
th_batch = torch.rand(N, len(chain.get_joint_parameter_names()), dtype=dtype, device=d)

# order of magnitudes faster when doing FK in parallel
# elapsed 0.008678913116455078s for N=1000 when parallel
# (N,4,4) transform matrix; only the one for the end effector is returned since end_only=True by default
tg_batch = chain.forward_kinematics(th_batch)

# elapsed 8.44686508178711s for N=1000 when serial
for i in range(N):
    tg = chain.forward_kinematics(th_batch[i])

We can compute gradients through the FK

import torch
import math
import pytorch_kinematics as pk

chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")

# require gradient through the input joint values
th = torch.tensor([0.0, -math.pi / 4.0, 0.0, math.pi / 2.0, 0.0, math.pi / 4.0, 0.0], requires_grad=True)
tg = chain.forward_kinematics(th)
m = tg.get_matrix()
pos = m[:, :3, 3]
pos.norm().backward()
# now th.grad is populated

We can load SDF and MJCF descriptions too, and pass in joint values via a dictionary (unspecified joints get th=0) for non-serial chains

import math
import torch
import pytorch_kinematics as pk

chain = pk.build_chain_from_sdf(open("simple_arm.sdf").read())
ret = chain.forward_kinematics({'arm_elbow_pan_joint': math.pi / 2.0, 'arm_wrist_lift_joint': -0.5})
# recall that we specify joint values and get link transforms
tg = ret['arm_wrist_roll']

# can also do this in parallel
N = 100
ret = chain.forward_kinematics({'arm_elbow_pan_joint': torch.rand(N, 1), 'arm_wrist_lift_joint': torch.rand(N, 1)})
# (N, 4, 4) transform object
tg = ret['arm_wrist_roll']

# building the robot from a MJCF file
chain = pk.build_chain_from_mjcf(open("ant.xml").read())
print(chain)
print(chain.get_joint_parameter_names())
th = {'hip_1': 1.0, 'ankle_1': 1}
ret = chain.forward_kinematics(th)

chain = pk.build_chain_from_mjcf(open("humanoid.xml").read())
print(chain)
print(chain.get_joint_parameter_names())
th = {'left_knee': 0.0, 'right_knee': 0.0}
ret = chain.forward_kinematics(th)

torch.compile Support

The FK computation can be compiled with torch.compile(fullgraph=True) for significant speedups, especially for applications that call FK thousands of times (e.g. inverse kinematics, trajectory optimization).

Use forward_kinematics_tensor, a compile-friendly variant that accepts and returns raw tensors instead of dicts and Transform3d objects:

import torch
import pytorch_kinematics as pk

chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")

# compile the FK kernel (one-time cost)
compiled_fk = torch.compile(chain.forward_kinematics_tensor, fullgraph=True, dynamic=True)

# input: (B, n_joints) tensor
th = torch.randn(1000, 7)

# output: (num_frames, B, 4, 4) tensor of homogeneous transforms for all frames
all_transforms = compiled_fk(th)

# index by frame: use chain.frame_to_idx to look up the index for a frame name
ee_idx = chain.frame_to_idx['lbr_iiwa_link_7']
ee_transform = all_transforms[ee_idx]  # (B, 4, 4)

Typical speedups on CPU (Kuka IIWA 7-DOF):

Batch size	`forward_kinematics`	Compiled `forward_kinematics_tensor`	Speedup
1	0.21 ms	0.04 ms	4.7x
64	0.26 ms	0.08 ms	3.5x
1024	1.13 ms	0.51 ms	2.2x

Speedups are larger for complex robots with more frames (e.g. 6x+ for 49-frame robots at batch=1). GPU speedups from CUDA graph capture provide additional gains at large batch sizes.

The standard forward_kinematics method (returning Dict[str, Transform3d]) also benefits from the refactored internals without any code changes. It is 2-6x faster at small batch sizes compared to v0.7.

The rotation conversion functions (matrix_to_quaternion, quaternion_to_axis_angle, axis_angle_to_quaternion, etc.) are also compatible with torch.compile(fullgraph=True) and can be compiled standalone or as part of a larger compiled graph:

import torch
import pytorch_kinematics as pk

# compile a rotation conversion function
compiled_m2q = torch.compile(pk.matrix_to_quaternion, fullgraph=True)
R = torch.randn(100, 3, 3)
q = compiled_m2q(R)

Jacobian calculation

The Jacobian (in the kinematics context) is a matrix describing how the end effector changes with respect to joint value changes (where is the twist, or stacked velocity and angular velocity):

For SerialChain we provide a differentiable and parallelizable method for computing the Jacobian with respect to the base frame.

import math
import torch
import pytorch_kinematics as pk

# can convert Chain to SerialChain by choosing end effector frame
chain = pk.build_chain_from_sdf(open("simple_arm.sdf").read())
# print(chain) to see the available links for use as end effector
# note that any link can be chosen; it doesn't have to be a link with no children
chain = pk.SerialChain(chain, "arm_wrist_roll_frame")

chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")
th = torch.tensor([0.0, -math.pi / 4.0, 0.0, math.pi / 2.0, 0.0, math.pi / 4.0, 0.0])
# (1,6,7) tensor, with 7 corresponding to the DOF of the robot
J = chain.jacobian(th)

# get Jacobian in parallel and use CUDA if available
N = 1000
d = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float64

chain = chain.to(dtype=dtype, device=d)
# Jacobian calculation is differentiable
th = torch.rand(N, 7, dtype=dtype, device=d, requires_grad=True)
# (N,6,7)
J = chain.jacobian(th)

# can get Jacobian at a point offset from the end effector (location is specified in EE link frame)
# by default location is at the origin of the EE frame
loc = torch.rand(N, 3, dtype=dtype, device=d)
J = chain.jacobian(th, locations=loc)

Like FK, the Jacobian has a torch.compile-compatible variant jacobian_tensor:

chain = pk.build_serial_chain_from_urdf(open("kuka_iiwa.urdf").read(), "lbr_iiwa_link_7")

# compile the Jacobian kernel (one-time cost)
compiled_jac = torch.compile(chain.jacobian_tensor, fullgraph=True)

# input: (B, n_joints) tensor; output: (B, 6, n_joints) Jacobian
th = torch.randn(100, 7)
J = compiled_jac(th)

For maximum performance in tight loops (e.g. IK), pass mode='max-autotune' to torch.compile for additional kernel tuning at the cost of longer compilation.

Typical speedups on CPU (Kuka IIWA 7-DOF, vs the old v0.7 calc_jacobian):

Batch size	Old `calc_jacobian`	New `jacobian` (eager)	Compiled `jacobian_tensor`	Speedup (compiled vs old)
1	0.63 ms	0.12 ms	0.03 ms	21x
10	0.79 ms	0.15 ms	0.05 ms	15x
100	0.75 ms	0.30 ms	0.09 ms	8x
1,000	1.67 ms	1.23 ms	0.53 ms	3x
10,000	7.93 ms	7.01 ms	4.91 ms	1.6x
100,000	59.42 ms	64.58 ms	46.46 ms	1.3x

At small-to-medium batch sizes (the typical IK regime), the new eager implementation is 2-5x faster due to vectorized computation replacing the per-frame Python loop. torch.compile provides an additional 2-4x on top of that, and is faster than the old implementation at every batch size.

The Jacobian can be used to do inverse kinematics. See IK survey for a survey of ways to do so. Note that IK may be better performed through other means (but doing it through the Jacobian can give an end-to-end differentiable method).

Inverse Kinematics (IK)

Inverse kinematics is available via damped least squares (iterative steps with Jacobian pseudo-inverse damped to avoid oscillation near singularlities). Compared to other IK libraries, these are the typical advantages over them:

not ROS dependent (many IK libraries need the robot description on the ROS parameter server)
batched in both goal specification and retries from different starting configurations
goal orientation in addition to goal position

See tests/test_inverse_kinematics.py for usage, but generally what you need is below:

full_urdf = os.path.join(search_path, urdf)
chain = pk.build_serial_chain_from_urdf(open(full_urdf).read(), "lbr_iiwa_link_7")

# goals are specified as Transform3d poses in the **robot frame**
# so if you have the goals specified in the world frame, you also need the robot frame in the world frame
pos = torch.tensor([0.0, 0.0, 0.0], device=device)
rot = torch.tensor([0.0, 0.0, 0.0], device=device)
rob_tf = pk.Transform3d(pos=pos, rot=rot, device=device)

# specify goals as Transform3d poses in world frame
goal_in_world_frame_tf = ...
# convert to robot frame (skip if you have it specified in robot frame already, or if world = robot frame)
goal_in_rob_frame_tf = rob_tf.inverse().compose(goal_tf)

# get robot joint limits
lim = torch.tensor(chain.get_joint_limits(), device=device)

# create the IK object
# see the constructor for more options and their explanations, such as convergence tolerances
ik = pk.PseudoInverseIK(chain, max_iterations=30, num_retries=10,
                        joint_limits=lim.T,
                        early_stopping_any_converged=True,
                        early_stopping_no_improvement="all",
                        debug=False,
                        lr=0.2)
# solve IK
sol = ik.solve(goal_in_rob_frame_tf)
# num goals x num retries x DOF tensor of joint angles; if not converged, best solution found so far
print(sol.solutions)
# num goals x num retries can check for the convergence of each run
print(sol.converged)
# num goals x num retries can look at errors directly
print(sol.err_pos)
print(sol.err_rot)

Compiled IK

For workloads that solve IK repeatedly (e.g. in a planning loop), pass use_compile=True to compile the FK, Jacobian, and damped least squares kernels inside the IK loop with torch.compile. The outer IK loop with its data-dependent convergence checks remains in eager mode.

ik = pk.PseudoInverseIK(chain, max_iterations=30, num_retries=10,
                         joint_limits=lim.T,
                         early_stopping_any_converged=True,
                         early_stopping_no_improvement="all",
                         lr=0.2,
                         use_compile=True)  # compile inner kernels

# first solve() incurs a one-time compilation cost; subsequent calls are faster
sol = ik.solve(goal_in_rob_frame_tf)

Typical speedups on CPU (Kuka IIWA 7-DOF, 30 iterations):

Retries	Eager (v0.7)	Eager (new)	Compiled	Speedup (compiled vs v0.7)
10	25.6 ms	23.3 ms	11.8 ms	2.2x
50	31.5 ms	29.0 ms	15.8 ms	2.0x

Note: use_compile=True requires PyTorch 2.0+. You cannot wrap ik.solve() directly with torch.compile due to data-dependent control flow in the outer loop — use this flag instead.

SDF Queries

See pytorch-volumetric for the latest details, some instructions are pasted here:

For many applications such as collision checking, it is useful to have the SDF of a multi-link robot in certain configurations. First, we create the robot model (loaded from URDF, SDF, MJCF, ...) with pytorch kinematics. For example, we will be using the KUKA 7 DOF arm model from pybullet data

import os
import torch
import pybullet_data
import pytorch_kinematics as pk
import pytorch_volumetric as pv

urdf = "kuka_iiwa/model.urdf"
search_path = pybullet_data.getDataPath()
full_urdf = os.path.join(search_path, urdf)
chain = pk.build_serial_chain_from_urdf(open(full_urdf).read(), "lbr_iiwa_link_7")
d = "cuda" if torch.cuda.is_available() else "cpu"

chain = chain.to(device=d)
# paths to the link meshes are specified with their relative path inside the URDF
# we need to give them the path prefix as we need their absolute path to load
s = pv.RobotSDF(chain, path_prefix=os.path.join(search_path, "kuka_iiwa"))

By default, each link will have a MeshSDF. To instead use CachedSDF for faster queries

s = pv.RobotSDF(chain, path_prefix=os.path.join(search_path, "kuka_iiwa"),
                link_sdf_cls=pv.cache_link_sdf_factory(resolution=0.02, padding=1.0, device=d))

Which when the y=0.02 SDF slice is visualized:

With surface points corresponding to:

Queries on this SDF is dependent on the joint configurations (by default all zero). Queries are batched across configurations and query points. For example, we have a batch of joint configurations to query

th = torch.tensor([0.0, -math.pi / 4.0, 0.0, math.pi / 2.0, 0.0, math.pi / 4.0, 0.0], device=d)
N = 200
th_perturbation = torch.randn(N - 1, 7, device=d) * 0.1
# N x 7 joint values
th = torch.cat((th.view(1, -1), th_perturbation + th))

And also a batch of points to query (same points for each configuration):

y = 0.02
query_range = np.array([
    [-1, 0.5],
    [y, y],
    [-0.2, 0.8],
])
# M x 3 points
coords, pts = pv.get_coordinates_and_points_in_grid(0.01, query_range, device=s.device)

We set the batch of joint configurations and query:

s.set_joint_configuration(th)
# N x M SDF value
# N x M x 3 SDF gradient
sdf_val, sdf_grad = s(pts)

Credits

pytorch_kinematics/transforms is extracted from pytorch3d with minor extensions. This was done instead of including pytorch3d as a dependency because it is hard to install and most of its code is unrelated. An important difference is that we use left hand multiplied transforms as is convention in robotics (T * pt) instead of their right hand multiplied transforms.
pytorch_kinematics/urdf_parser_py, and pytorch_kinematics/mjcf_parser is extracted from kinpy, as well as the FK logic. This repository ports the logic to pytorch, parallelizes it, and provides some extensions.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
src/pytorch_kinematics		src/pytorch_kinematics
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Robot Kinematics

Installation

Reference

Usage

Loading Robots

Forward Kinematics (FK)

torch.compile Support

Jacobian calculation

Inverse Kinematics (IK)

Compiled IK

SDF Queries

Credits

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyTorch Robot Kinematics

Installation

Reference

Usage

Loading Robots

Forward Kinematics (FK)

torch.compile Support

Jacobian calculation

Inverse Kinematics (IK)

Compiled IK

SDF Queries

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages