We present the development and real world demonstration of an in-flight attitude control law for a small low-cost quadruped with a five-bar-linkage leg design using only its legs as reaction masses. The control law is trained using deep reinforcement learning (DRL) and specifically through Proximal Policy Optimization (PPO) in the NVIDIA Omniverse Isaac Sim simulator with a GPU-accelerated RL pipeline. To demonstrate the policy, a small quadruped is designed and constructed, and evaluated both on a rotating pole test setup and in free fall. During a free fall of 0.7 seconds, commanded orientation of 45 degrees in all principal axis is demonstrated, along with an average base angular velocity of 110 degrees per second during large attitude reference steps.
The control problem is described as follows: Given a quadruped with a body-fixed frame \( \mathcal{B} \) and an inertial frame \(\mathcal{W}\), we want to adjust \(\mathcal{B}\) to align with a target frame \(\mathcal{T}\). The target frame \(\mathcal{T}\) has the same origin as \(\mathcal{B}\) but may have a different orientation, as shown in the figure below. We assume the quadruped is "floating" in a zero-gravity environment and that it must reach its target orientation using solely its legs as reaction masses.
The figure below illustrates the control loop, including the implemented control law or policy. The policy (in blue and green) takes an observation vector as input. This vector consists of the error quaternion \( q^t_b \), which represents the orientation of the body frame relative to the target frame, the angular velocity of the body in the target frame \( \omega^t_b \), and the estimated motor positions and velocities (\(\theta_m, \dot{\theta}_m\)). The multi-layer perceptron has three layers with sizes [128, 64, 64] and uses ELU activation functions. The weights are trained in simulation using PPO. The policy outputs motor position references \(\theta_r\), which are either directly sent to the servo motors or passed through a reference model first. The quadruped's orientation relative to the inertial frame is measured using a Qualisys motion capture system, and the target orientation is set by the user.
To test attitude control in 3D, free fall tests are conducted by raising the quadruped into a mount designed to lock it into a known initial configuration. A magnetic switch is installed to remotely release the quadruped into a foam pit. Policy inference is initiated after an observed vertical displacement of 3 centimetres, and the configuration allows for a free fall duration of 0.7 seconds.
We present videos of selected free fall experiments. The quadruped is initialised to a neutral orientation (\(\mathcal{B}_0\)) and is then commanded to different orientation targets (\(\mathcal{T}\)). The videos show a wide shot of the drop on the left, a close-up from a high-speed camera in the middle, and the simulation response for the same initial orientation and target on the right. The graph shows the error quaternion \( q^t_b \) decomposed into Euler angles (ZYX), which we aim to regulate to zero.
*Note that the target in this case is not reached. The absolute value of the axis-angle representation of the quaternion is plotted.
*Note that the target in this case is not reached. The absolute value of the axis-angle representation of the quaternion is plotted.
To test the system at lower joint velocities and over extended periods, a stiff forked aluminum pole is mounted to an open ball bearing, such that the quadruped can be slid onto it in all principal axes.
We present a range of videos from the experiments on the rotating pole. First, policies trained with a pole are compared with the simulator response. Then the the "pole-policies" are compared with a generic policy trained in 3D without a pole. A video showcasing the ability to follow a moving reference is included in the end.
Note that the synchronisation between the videos and the plotting is not perfect due to limited editing time.
Note that the pole policy is constrained to transversal motors only.
The reference following was done with a roll policy trained with 300 degrees/s max motor velocities.
Before construction of the quadruped and test setup could begin, it was instructive to analyse the effects of such a pole on attitude control capabilities in a case study involving a single leg. Motions are hard-coded through motion primitives.