# Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

How to cite these notes

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2019 semester.

# Preface

This book is about building robots that move with speed, efficiency, and grace. I believe that this can only be achieve through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. Therefore, these notes contain selected material from dynamical systems theory, as well as linear and nonlinear control.

These notes also reflect a deep belief in computational algorithms playing an essential role in finding and optimizing solutions to complex dynamics and control problems. Algorithms play an increasingly central role in modern control theory; these days even rigorous mathematicians consider finding convexity in a problem (therefore making it amenable to an efficient computational solution) almost tantamount to an analytical result. Therefore, the notes necessarily also cover selected material from optimization theory, motion planning, and machine learning.

Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress the capabilities we are trying to develop. Many of the disciplines that I am drawing from are traditionally very rigorous, to the point where the basic ideas can be hard to penetrate for someone that is new to the field. I've made a conscious effort in these notes to keep a very informal, conversational tone even when introducing these rigorous topics, and to reference the most powerful theorems but only to prove them when that proof would add particular insights without distracting from the mainstream presentation. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any enthusiastic roboticist.

# Organization

The material in these notes is organized into a few main parts. "Model Systems" introduces a series of increasingly complex dynamical systems and overviews some of the relevant results from the literature for each system. "Nonlinear Planning and Control" introduces quite general computational algorithms for reasoning about those dynamical systems, with optimization theory playing a central role. Many of these algorithms treat the dynamical system as known and deterministic until the last chapters in this part which introduce stochasticity and robustness. "Estimation and Learning" follows this up with techniques from statistics and machine learning which capitalize on this viewpoint to introduce additional algorithms which can operate with less assumptions on knowing the model or having perfect sensors. The book closes with an "Appendix" that provides slightly more introduction (and references) for the main topics used in the course.

The order of the chapters was chosen to make the book valuable as a reference. When teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem.

insert figure showing progression of problems here. pendulum -> cp/acro -> walking ... with chapter numbers associated.

# Software

All of the examples and algorithms in this book, plus many more, are now available as a part of our open-source software project: . is a C++ project, but in this text we will use Drake's Python bindings. I encourage super-users or readers who want to dig deeper to explore the C++ code as well (and to contribute back).

Please see the appendix for specific instructions for using along with these notes.

First chapter

# Fully-actuated vs Underactuated Systems

Robots today move far too conservatively, and accomplish only a fraction of the tasks and achieve a fraction of the performance that they are mechanically capable of. In many cases, we are still fundamentally limited by control technology which matured on rigid robotic arms in structured factory environments. The study of underactuated robotics focuses on building control systems which use the natural dynamics of the machines in an attempt to achieve extraordinary performance in terms of speed, efficiency, or robustness.

# Honda's ASIMO vs. passive dynamic walkers

The world of robotics changed when, in late 1996, Honda Motor Co. announced that they had been working for nearly 15 years (behind closed doors) on walking robot technology. Their designs have continued to evolve, resulting in a humanoid robot they call ASIMO (Advanced Step in Innovative MObility). For nearly 20 years, Honda's robots were widely considered to represent the state of the art in walking robots, although there are now many robots with designs and performance very similar to ASIMO's. We will dedicate effort to understanding a few of the details of ASIMO when we discuss algorithms for walking... for now I just want you to become familiar with the look and feel of ASIMO's movements [watch the asimo video below now].

I hope that your first reaction is to be incredibly impressed with the quality and versatility of ASIMO's movements. Now take a second look. Although the motions are very smooth, there is something a little unnatural about ASIMO's gait. It feels a little like an astronaut encumbered by a heavy space suit. In fact this is a reasonable analogy... ASIMO is walking a bit like somebody that is unfamiliar with his/her dynamics. Its control system is using high-gain feedback, and therefore considerable joint torque, to cancel out the natural dynamics of the machine and strictly follow a desired trajectory. This control approach comes with a stiff penalty. ASIMO uses roughly 20 times the energy (scaled) that a human uses to walk on the flat (measured by cost of transport)Collins05. Also, control stabilization in this approach only works in a relatively small portion of the state space (when the stance foot is flat on the ground), so ASIMO can't move nearly as quickly as a human, and cannot walk on unmodelled or uneven terrain.

For contrast, let's now consider a very different type of walking robot, called a passive dynamic walker (PDW). This "robot" has no motors, no controllers, no computer, but is still capable of walking stably down a small ramp, powered only by gravity [watch videos above now]. Most people will agree that the passive gait of this machine is more natural than ASIMO's; it is certainly more efficient. Passive walking machines have a long history - there are patents for passively walking toys dating back to the mid 1800's. We will discuss, in detail, what people know about the dynamics of these machines and what has been accomplished experimentally. This most impressive passive dynamic walker to date was built by Steve Collins in Andy Ruina's lab at CornellCollins01.

Passive walkers demonstrate that the high-gain, dynamics-cancelling feedback approach taken on ASIMO is not a necessary one. In fact, the dynamics of walking is beautiful, and should be exploited - not cancelled out.

The world is just starting to see what this vision could look like. This video from Boston Dynamics is one of my favorites of all time:

This result is a marvel of engineering (the mechanical design alone is amazing...). In this class, we'll teach you the computational tools to required to make robots perform this way. We'll also try to reason about how robust these types of maneuvers are and can be. Don't worry, if you do not have a super lightweight, super capable, and super durable humanoid, then a simulation will be provided for you.

# Birds vs. modern aircraft

The story is surprisingly similar in a very different type of machine. Modern airplanes are extremely effective for steady-level flight in still air. Propellers produce thrust very efficiently, and today's cambered airfoils are highly optimized for speed and/or efficiency. It would be easy to convince yourself that we have nothing left to learn from birds. But, like ASIMO, these machines are mostly confined to a very conservative, low angle-of-attack flight regime where the aerodynamics on the wing are well understood. Birds routinely execute maneuvers outside of this flight envelope (for instance, when they are landing on a perch), and are considerably more effective than our best aircraft at exploiting energy (eg, wind) in the air.

As a consequence, birds are extremely efficient flying machines; some are capable of migrating thousands of kilometers with incredibly small fuel supplies. The wandering albatross can fly for hours, or even days, without flapping its wings - these birds exploit the shear layer formed by the wind over the ocean surface in a technique called dynamic soaring. Remarkably, the metabolic cost of flying for these birds is indistinguishable from the baseline metabolic costArnould96, suggesting that they can travel incredible distances (upwind or downwind) powered almost completely by gradients in the wind. Other birds achieve efficiency through similarly rich interactions with the air - including formation flying, thermal soaring, and ridge soaring. Small birds and large insects, such as butterflies and locusts, use "gust soaring" to migrate hundreds or even thousands of kilometers carried primarily by the wind.

Birds are also incredibly maneuverable. The roll rate of a highly acrobatic aircraft (e.g, the A-4 Skyhawk) is approximately 720 deg/secShyy08; a barn swallow has a roll rate in excess of 5000 deg/secShyy08. Bats can be flying at full-speed in one direction, and completely reverse direction while maintaining forward speed, all in just over 2 wing-beats and in a distance less than half the wingspanTian06. Although quantitative flow visualization data from maneuvering flight is scarce, a dominant theory is that the ability of these animals to produce sudden, large forces for maneuverability can be attributed to unsteady aerodynamics, e.g., the animal creates a large suction vortex to rapidly change directionTriantafyllou95. These astonishing capabilities are called upon routinely in maneuvers like flared perching, prey-catching, and high speed flying through forests and caves. Even at high speeds and high turn rates, these animals are capable of incredible agility - bats sometimes capture prey on their wings, Peregrine falcons can pull 25 G's out of a 240 mph dive to catch a sparrow in mid-flightTucker98, and even the small birds outside our building can be seen diving through a chain-link fence to grab a bite of food.

Although many impressive statistics about avian flight have been recorded, our understanding is partially limited by experimental accessibility - it is quite difficult to carefully measure birds (and the surrounding airflow) during their most impressive maneuvers without disturbing them. The dynamics of a swimming fish are closely related, and can be more convenient to study. Dolphins have been known to swim gracefully through the waves alongside ships moving at 20 knotsTriantafyllou95. Smaller fish, such as the bluegill sunfish, are known to possess an escape response in which they propel themselves to full speed from rest in less than a body length; flow visualizations indeed confirm that this is accomplished by creating a large suction vortex along the side of the bodyTytell08 - similar to how bats change direction in less than a body length. There are even observations of a dead fish swimming upstream by pulling energy out of the wake of a cylinder; this passive propulsion is presumably part of the technique used by rainbow trout to swim upstream at mating seasonBeal06.

# Manipulation

Despite a long history of success in industrial applications, and the huge potential for consumer applications, we still don't have robot arms that can perform any meaningful tasks in the home. Admittedly, the perception problem (using sensors to detect/localize objects and understand the scene) for home robotics is incredibly difficult. But even if we were given a perfect perception system, our robots are still a long way from performing basic object manipulation tasks with the dexterity and versatility of a human.

Most robots that perform object manipulation today use a stereotypical pipeline. First, we enumerate a handful of contact locations on the hand (these points, and only these points, are allowed to contact the world). Then, given a localized object in the environment, we plan a collision-free trajectory for the arm that will move the hand into a "pre-grasp" location. At this point the robot closes it's eyes (figuratively) and closes the hand, hoping that the pre-grasp location was good enough that the object will be successfully grasped using e.g. only current feedback in the fingers to know when to stop closing. "Underactuated hands" make this approach more successful, but the entire approach really only works well for enveloping grasps.

The enveloping grasps approach may actually be sufficient for a number of simple pick-and-place tasks, but it is a very poor representation of how humans do manipulation. When humans manipulate objects, the contact interactions with the object and the world are very rich -- we often use pieces of the environment as fixtures to reduce uncertainty, we commonly exploit slipping behaviors (e.g. for picking things up, or reorienting it in the hand), and our brains don't throw NaNs if we use the entire surface of our arms to e.g. manipulate a large object.

Add videos. Salami pick and place? Rick's box turning? Alex's disney bubbles?

By the way, in most cases, if the robots fail to make contact at the anticipated contact times/locations, bad things can happen. The results are hilarious and depressing at the same time. (Let's fix that!)

# The common theme

Classical control techniques for robotics are based on the idea that feedback can be used to override the dynamics of our machines. These examples suggest that to achieve outstanding dynamic performance (efficiency, agility, and robustness) from our robots, we need to understand how to design control systems which take advantage of the dynamics, not cancel them out. That is the topic of this course.

Surprisingly, many formal control ideas do not support the idea of "exploiting" the dynamics. Optimal control formulations (which we will study in depth) allow it in principle, but optimal control of nonlinear systems is still a relatively ad hoc discipline. Sometimes I joke that in order to convince a control theorist to consider the dynamics, you have to do something drastic, like taking away her control authority - remove a motor, or enforce a torque-limit. These issues have created a formal class of systems, the underactuated systems, for which people have begun to more carefully consider the dynamics of their machines in the context of control.

# Definitions

According to Newton, the dynamics of mechanical systems are second order ($F = ma$). Their state is given by a vector of positions, $\bq$ (also known as the configuration vector), and a vector of velocities, $\dot{\bq}$, and (possibly) time. The general form for a second-order controllable dynamical system is: $$\ddot{\bq} = {\bf f}(\bq,\dot{\bq},\bu,t),$$ where $\bu$ is the control vector. As we will see, the dynamics for many of the robots that we care about turn out to be affine in commanded torque, so let's consider a slightly constrained form: $$\ddot{\bq} = {\bf f}_1(\bq,\dot{\bq},t) + {\bf f}_2(\bq,\dot{\bq},t)\bu \label{eq:f1_plus_f2}.$$

# Fully-Actuated

A control system described by equation \ref{eq:f1_plus_f2} is fully-actuated in state $(\bq,\dot{\bq})$ at time $t$ if it is able to command any instantaneous acceleration in $\bq$: $$\textrm{rank}\left[{\bf f}_2 (\bq,\dot{\bq},t)\right] = \dim\left[\bq\right].$$

# Underactuated

A control system described by equation \ref{eq:f1_plus_f2} is underactuated in state $(\bq,\dot{\bq})$ at time $t$ if it is not able to command an arbitrary instantaneous acceleration in $\bq$: $$\textrm{rank}\left[{\bf f}_2(\bq,\dot{\bq},t)\right] < \dim\left[\bq\right]. \label{eq:underactuated_def}$$

Notice that whether or not a control system is underactuated may depend on the state of the system or even on time, although for most systems (including all of the systems in this book) underactuation is a global property of the system. We will refer to a system as underactuated if it is underactuated in all states and times. In practice, we often refer informally to systems as fully actuated as long as they are fully actuated in most states (e.g., a "fully-actuated" system might still have joint limits or lose rank at a kinematic singularity). Admittedly, this permits the existence of a gray area, where it might feel awkward to describe the system as either fully- or underactuated (we should instead only describe its states); we'll see examples, for instance, of hybrid systems like walking robots that are fully-actuated in some modes but underactuated in others. I will still informally refer to these systems as being underactuated whenever reasoning about the underactuation is useful/necessary for developing a control strategy.

# Robot Manipulators

Consider the simple robot manipulator illustrated above. As described in the Appendix, the equations of motion for this system are quite simple to derive, and take the form of the standard "manipulator equations": $${\bf M}(\bq)\ddot\bq + \bC(\bq,\dot\bq)\dot\bq = \btau_g(\bq) + {\bf B}\bu.$$ It is well known that the inertia matrix, ${\bf M}(\bq)$ is (always) uniformly symmetric and positive definite, and is therefore invertible. Putting the system into the form of equation \ref{eq:f1_plus_f2} yields: \begin{align*}\ddot{\bq} =& {\bf M}^{-1}(\bq)\left[ \btau_g(\bq) + \bB\bu - \bC(\bq,\dot\bq)\dot\bq \right].\end{align*} Because ${\bf M}^{-1}(\bq)$ is always full rank, we find that a system described by the manipulator equations is fully-actuated if and only if $\bB$ is full row rank. For this particular example, $\bq = [\theta_1,\theta_2]^T$ and $\bu = [\tau_1,\tau_2]^T$ (motor torques at the joints), and $\bB = {\bf I}_{2 \times 2}$. The system is fully actuated.

make drake elements expandable/collapsible. nice example here: http://quhno.internetstrahlen.de/myopera/csstests/collapsible-paragraph.html

# Python Example

I personally learn best when I can experiment and get some physical intuition. The companion software for the course should make it easy for you to see this system in action. To try it, make sure you've followed the installation instructions in the Appendix, then open Python and try the following lines in your Python console:

double_pendulum/simulator.py

It's worth taking a peek at the file that describes the robot. URDF and SDF are two of the standard formats, and they can be used to describe even very complicated robots (like the Boston Dynamics humanoid).

We can also use to evaluate the manipulator equations . actually has a symbolic dynamics engine, but for now we will just evaluate the manipulator equations for a particular robot (with numerical values assigned for mass, link lengths, etc) and for a particular state of the robot:

double_pendulum/manipulator.py Implement the symbolic version

While the basic double pendulum is fully actuated, imagine the somewhat bizarre case that we have a motor to provide torque at the elbow, but no motor at the shoulder. In this case, we have $\bu = \tau_2$, and $\bB(\bq) = [0,1]^T$. This system is clearly underactuated. While it may sound like a contrived example, it turns out that it is almost exactly the dynamics we will use to study as our simplest model of walking later in the class.

The matrix ${\bf f}_2$ in equation \ref{eq:f1_plus_f2} always has dim$[\bq]$ rows, and dim$[\bu]$ columns. Therefore, as in the example, one of the most common cases for underactuation, which trivially implies that ${\bf f}_2$ is not full row rank, is dim$[\bu] <$ dim$[\bq]$. This is the case when a robot has joints with no motors. But this is not the only case. The human body, for instance, has an incredible number of actuators (muscles), and in many cases has multiple muscles per joint; despite having more actuators than position variables, when I jump through the air, there is no combination of muscle inputs that can change the ballistic trajectory of my center of mass (barring aerodynamic effects). My control system is underactuated.

For completeness, let's generalize the definition of underactuation to systems beyond the second-order control affine systems.

# Underactuated Control Differential Equations

An $n$th-order control differential equation control described by the equations $$\frac{d^n\bq}{dt^n} = f(\bq, ..., \frac{d^{n-1} \bq}{dt^{n-1}}, t, {\bf u})$$ is fully actuated in state ${\bf x} = (\bq, ..., \frac{d^{n-1} \bq}{dt^{n-1}})$ and time $t$ if the resulting map ${\bf f}$ is surjective: for every $\frac{d^n\bq}{dt^n}$ there exists a ${\bf u}$ which produces the desired response. Otherwise it is underactuated.

It is easy to see that equation \ref{eq:underactuated_def} is a sufficient condition for underactuation. This definition can also be extended to discrete-time systems and/or differential inclusions.

A quick note about notation. When describing the dynamics of rigid-body systems in this class, I will use $\bq$ for configurations (positions), $\dot{\bq}$ for velocities, and use $\bx$ for the full state ($\bx = [\bq^T,\dot{\bq}^T]^T$). There is an important limitation to this convention (3D angular velocity should not be represented as the derivative of 3D pose) described in the Appendix, but it will keep the notes cleaner. Unless otherwise noted, vectors are always treated as column vectors. Vectors and matrices are bold (scalars are not).

# Feedback Linearization

Fully-actuated systems are dramatically easier to control than underactuated systems. The key observation is that, for fully-actuated systems with known dynamics (e.g., ${\bf f}_1$ and ${\bf f}_2$ are known for a second-order control-affine system), it is possible to use feedback to effectively change a nonlinear control problem into a linear control problem. The field of linear control is incredibly advanced, and there are many well-known solutions for controlling linear systems.

The trick is called feedback linearization. When ${\bf f}_2$ is full row rank, it is invertibleIf ${\bf f}_2$ is not square, for instance you have multiple actuators per joint, then this inverse may not be unique.. Consider the nonlinear feedback control: $$\bu = {\bf \pi}(\bq,\dot\bq,t) = {\bf f}_2^{-1}(\bq,\dot\bq,t) \left[ \bu' - {\bf f}_1(\bq,\dot\bq,t) \right],$$ where $\bu'$ is the new control input (an input to your controller). Applying this feedback controller to equation \ref{eq:f1_plus_f2} results in the linear, decoupled, second-order system: $$\ddot{\bq} = \bu'.$$ In other words, if ${\bf f}_1$ and ${\bf f}_2$ are known and ${\bf f}_2$ is invertible, then we say that the system is "feedback equivalent" to $\ddot{\bq} = \bu'$. There are a number of strong results which generalize this idea to the case where ${\bf f}_1$ and ${\bf f}_2$ are estimated, rather than known (e.g, Slotine90).

# Feedback-Cancellation Double Pendulum

Let's say that we would like our simple double pendulum to act like a simple single pendulum (with damping), whose dynamics are given by: \begin{align*} \ddot \theta_1 &= -\frac{g}{l}\sin\theta_1 -b\dot\theta_1 \\ \ddot\theta_2 &= 0. \end{align*} This is easily achieved using Note that our chosen dynamics do not actually stabilize $\theta_2$ - this detail was left out for clarity, but would be necessary for any real implementation. $$\bu = \bB^{-1}\left[ \bC\dot{\bq} - \btau_g + {\bf M}\begin{bmatrix} -\frac{g}{l}s_1 - b\dot{q}_1 \\ 0 \end{bmatrix} \right].$$

Since we are embedding a nonlinear dynamics (not a linear one), we refer to this as "feedback cancellation", or "dynamic inversion". This idea can, and does, make control look easy - for the special case of a fully-actuated deterministic system with known dynamics. For example, it would have been just as easy for me to invert gravity. Observe that the control derivations here would not have been any more difficult if the robot had 100 joints.

# Python Example

You can run these examples using: double_pendulum/as_single_pendulum.py double_pendulum/as_single_pendulum.py

As always, make soure you take a moment to read through the source code (linked above).

The underactuated systems are not feedback linearizable. Therefore, unlike fully-actuated systems, the control designer has no choice but to reason about the nonlinear dynamics of the plant in the control design. This dramatically complicates feedback controller design.

# Input and State Constraints

Although the dynamic constraints due to missing actuators certainly embody the spirit of this course, many of the systems we care about could be subject to other dynamic constraints as well. For example, the actuators on our machines may only be mechanically capable of producing some limited amount of torque, or there may be a physical obstacle in the free space with which we cannot permit our robot to come into contact with.

# Input and State Constraints

A dynamical system described by $\dot{\bx} = {\bf f}(\bx,\bu,t)$ may be subject to one or more constraints described by $\bphi(\bx,\bu,t)\ge0$.

In practice it can be useful to separate out constraints which depend only on the input, e.g. $\phi(\bu)\ge0$, such as actuator limits, as they can often be easier to handle than state constraints. An obstacle in the environment might manifest itself as one or more constraints that depend only on position, e.g. $\phi(\bq)\ge0$.

By our generalized definition of underactuation, we can see that input constraints can certainly cause a system to be underactuated. State (only) constraints are more subtle -- in general these actually reduce the dimensionality of the state space, therefore requiring less dimensions of actuation to achieve "full" control, but we only reap the benefits if we are able to perform the control design in the "minimal coordinates" (which is often difficult).

# Input limits

Consider the constrained second-order linear system $\ddot{x} = u, \quad |u| \le 1.$ By our definition, this system is underactuated. For example, there is no $u$ which can produce the acceleration $\ddot{x} = 2$.

Input and state constraints can complicate control design in similar ways to having an insufficient number of actuators, (i.e., further limiting the set of the feasible trajectories), and often require similar tools to find a control solution.

# Nonholonomic constraints

You might have heard of the term "nonholonomic system" (see e.g. Bloch03), and be thinking about how nonholonomy relates to underactuation. Briefly, a nonholonomic constraint is a constraint of the form $\phi(\bq, {\bf \dot{q}}, t) = 0$, which cannot be integrated into a constraint of the form $\phi(\bq, t) = 0$ (a holonomic constraint). A nonholonomic constraint does not restrain the possible configurations of the system, but rather the manner in which those configurations can be reached. While a holonomic constraint reduces the number of degrees of freedom of a system by one, a nonholonomic constraint does not. An automobile or traditional wheeled robot provides a canonical example:

# Wheeled robot

Consider a simple model of a wheeled robot whose configuration is described by its Cartesian position $x,y$ and its orientation, $\theta$, so $\bq = \begin{bmatrix} x, y, \theta \end{bmatrix}^T$. The system is subject to a differential constraint that prevents side-slip, \begin{gather*} \dot{x} = v \cos\theta \\ \dot{y} = v \sin\theta \\ v = \sqrt{\dot{x}^2 + \dot{y}^2} \end{gather*} or equivalently, $\dot{y} \cos \theta - \dot x \sin \theta = 0.$ This constraint cannot be integrated into a constraint on configuration—the car can get to any configuration $(x,y,\theta)$, it just can't move directly sideways—so this is a nonholonomic constraint.

Contrast the wheeled robot example with a robot on train tracks. The train tracks correspond to a holonomic constraint: the track constraint can be written directly in terms of the configuration $\bq$ of the system, without using the velocity ${\bf \dot{q}}$. Even though the track constraint could also be written as a differential constraint on the velocity, it would be possible to integrate this constraint to obtain a constraint on configuration. The track restrains the possible configurations of the system.

A nonholonomic constraint like the no-side-slip constraint on the wheeled vehicle certainly results in an underactuated system. The converse is not necessarily true—the double pendulum system which is missing an actuator is underactuated but would not typically be called a nonholonomic system. Note that the Lagrangian equations of motion are a constraint of the form $\bphi(\bq,{\bf \dot{q}},{\bf \ddot{q}},t) = 0,$ so do not qualify as a nonholonomic constraint.

# Underactuated robotics

The control of underactuated systems is an open and interesting problem in controls. Although there are a number of special cases where underactuated systems have been controlled, there are relatively few general principles. Now here's the rub... most of the interesting problems in robotics are underactuated:

• Legged robots are underactuated. Consider a legged machine with $N$ internal joints and $N$ actuators. If the robot is not bolted to the ground, then the degrees of freedom of the system include both the internal joints and the six degrees of freedom which define the position and orientation of the robot in space. Since $\bu \in \Re^N$ and $\bq \in \Re^{N+6}$, equation \ref{eq:underactuated_def} is satisfied.
• (Most) Swimming and flying robots are underactuated. The story is the same here as for legged machines. Each control surface adds one actuator and one DOF. And this is already a simplification, as the true state of the system should really include the (infinite-dimensional) state of the flow.
• Robot manipulation is (often) underactuated. Consider a fully-actuated robotic arm. When this arm is manipulating an object with degrees of freedom (even a brick has six), it can become underactuated. If force closure is achieved, and maintained, then we can think of the system as fully-actuated, because the degrees of freedom of the object are constrained to match the degrees of freedom of the hand. That is, of course, unless the manipulated object has extra DOFs (for example, any object that is deformable).

Even control systems for fully-actuated and otherwise unconstrained systems can be improved using the lessons from underactuated systems, particularly if there is a need to increase the efficiency of their motions or reduce the complexity of their designs.

# Goals for the course

This course is based on the observation that there are new computational tools from optimization theory, control theory, motion planning, and even machine learning which can be used to design feedback control for underactuated systems. The goal of this class is to develop these tools in order to design robots that are more dynamic and more agile than the current state-of-the-art.

The target audience for the class includes both computer science and mechanical/aero students pursuing research in robotics. Although I assume a comfort with linear algebra, ODEs, and Python, the course notes aim to provide most of the material and references required for the course.

I have a confession: I actually think that the material we'll cover in these notes is valuable far beyond robotics. I think that systems theory provides a powerful language for organizing computation in exceedingly complex systems -- especially when one is trying to program and/or analyze systems with continuous variables in a feedback loop (which happens throughout computer science and engineering, by the way). I hope you find these tools to be broadly useful, even if you don't have a humanoid robot capable of performing a backflip at your immediate disposal.

# Introduction

Our goals for this chapter are modest: we'd like to understand the dynamics of a pendulum.

Why a pendulum? In part, because the dynamics of a majority of our multi-link robotics manipulators are simply the dynamics of a large number of coupled pendula. Also, the dynamics of a single pendulum are rich enough to introduce most of the concepts from nonlinear dynamics that we will use in this text, but tractable enough for us to (mostly) understand in the next few pages.

The Lagrangian derivation of the equations of motion (as described in the appendix) of the simple pendulum yields: \begin{equation*} m l^2 \ddot\theta(t) + mgl\sin{\theta(t)} = Q. \end{equation*} We'll consider the case where the generalized force, $Q$, models a damping torque (from friction) plus a control torque input, $u(t)$: $$Q = -b\dot\theta(t) + u(t).$$

# Nonlinear dynamics with a constant torque

Let us first consider the dynamics of the pendulum if it is driven in a particular simple way: a torque which does not vary with time: $$ml^2 \ddot\theta + b\dot\theta + mgl \sin\theta = u_0.$$

.

# Simple Pendulum in Python

You can experiment with this system in using

pendulum/torque_slider_demo.py

(use the slider at the top to adjust the torque input).

These are relatively simple differential equations, so if I give you $\theta(0)$ and $\dot\theta(0)$, then you should be able to integrate them to obtain $\theta(t)$... right? Although it is possible, integrating even the simplest case ($b = u = 0$) involves elliptic integrals of the first kind; there is relatively little intuition to be gained here.

This is in stark contrast to the case of linear systems, where much of our understanding comes from being able to explicitly integrate the equations. For instance, for a simple linear system we have $$\dot{q} = a q \quad \rightarrow \quad q(t) = q(0) e^{at},$$ and we can immediately understand that the long-term behavior of the system is a (stable) decaying exponential if $a<0$, an (unstable) growing exponential if $a>0$, and that the system does nothing if $a=0$. Here we are with certainly one of the simplest nonlinear systems we can imagine, and we can't even solve this system?

All is not lost. If what we care about is the long-term behavior of the system, then there are a number of techniques we can apply. In this chapter, we will start by investigating graphical solution methods. These methods are described beautifully in a book by Steve StrogatzStrogatz94.

# The overdamped pendulum

Let's start by studying a special case -- intuitively when $b\dot\theta \gg ml^2\ddot\theta$ -- which via dimensional analysis (using the natural frequency $\sqrt{\frac{g}{l}}$ to match units) occurs when $b \sqrt\frac{l}{g} \gg ml^2$. This is the case of heavy damping, for instance if the pendulum was moving in molasses. In this case, the damping term dominates the acceleration term, and we have: $$ml^2 \ddot\theta + b\dot\theta \approx b\dot\theta = u_0 - mgl\sin\theta.$$ In other words, in the case of heavy damping, the system looks approximately first order. This is a general property of heavily damped systems, such as fluids at very low Reynolds number.

I'd like to ignore one detail for a moment: the fact that $\theta$ wraps around on itself every $2\pi$. To be clear, let's write the system without the wrap-around as: $$b\dot{x} = u_0 - mgl\sin{x}.\label{eq:overdamped_pend_ct}$$ Our goal is to understand the long-term behavior of this system: to find $x(\infty)$ given $x(0)$. Let's start by plotting $\dot{x}$ vs $x$ for the case when $u_0=0$:

The first thing to notice is that the system has a number of fixed points or steady states, which occur whenever $\dot{x} = 0$. In this simple example, the zero-crossings are $x^* = \{..., -\pi, 0, \pi, 2\pi, ...\}$. When the system is in one of these states, it will never leave that state. If the initial conditions are at a fixed point, we know that $x(\infty)$ will be at the same fixed point.

Next let's investigate the behavior of the system in the local vicinity of the fixed points. Examining the fixed point at $x^* = \pi$, if the system starts just to the right of the fixed point, then $\dot{x}$ is positive, so the system will move away from the fixed point. If it starts to the left, then $\dot{x}$ is negative, and the system will move away in the opposite direction. We'll call fixed-points which have this property unstable. If we look at the fixed point at $x^* = 0$, then the story is different: trajectories starting to the right or to the left will move back towards the fixed point. We will call this fixed point locally stable. More specifically, we'll distinguish between three types of local stability:

• Locally stable in the sense of Lyapunov (i.s.L.). A fixed point, $x^*$ is locally stable i.s.L. if for every small $\epsilon > 0$, I can produce a $\delta > 0$ such that if $\| x(0) - x^* \| < \delta$ then $\forall t$ $\| x(t) - x^*\| < \epsilon$. In words, this means that for any ball of size $\epsilon$ around the fixed point, I can create a ball of size $\delta$ which guarantees that if the system is started inside the $\delta$ ball then it will remain inside the $\epsilon$ ball for all of time.
• Locally asymptotically stable. A fixed point is locally asymptotically stable if $x(0) = x^* + \epsilon$ implies that $\lim_{t\rightarrow \infty} x(t) = x^*$.
• Locally exponentially stable. A fixed point is locally exponentially stable if $x(0) = x^* + \epsilon$ implies that $\| x(t) - x^* \| < Ce^{-\alpha t}$, for some positive constants $C$ and $\alpha$.

An initial condition near a fixed point that is stable in the sense of Lyapunov may never reach the fixed point (but it won't diverge), near an asymptotically stable fixed point will reach the fixed point as $t \rightarrow \infty$, and near an exponentially stable fixed point will reach the fixed point with a bounded rate. An exponentially stable fixed point is also an asymptotically stable fixed point, but the converse is not true. Asymptotic stability and Lyapunov stability, however, are distinct notions -- it is actually possible to have a system that is asymptotically stable but not stable i.s.L.†† we can't see that in one dimension so will have to hold that example for a moment. Systems which are stable i.s.L. but not asymptotically stable are easy to construct (e.g. $\dot{x} = 0$). Interestingly, it is also possible to have nonlinear systems that converge (or diverge) in finite-time; a so-called finite-time stability; we will see examples of this later in the book, but it is a difficult topic to penetrate with graphical analysis. Rigorous nonlinear system analysis is rich with subtleties and surprises. Moreover, these differences actually matter -- the code that we will write to stabilize the systems will be subtley different depending on what type of stability we want, and it can make or break the success of our methods.

Our graph of $\dot{x}$ vs. $x$ can be used to convince ourselves of i.s.L. and asymptotic stability by visually inspecting $\dot{x}$ in the vicinity of a fixed point. Even exponential stability can be inferred if the function can be bounded away from the origin by a negatively-sloped line through the fixed point, since it implies that the nonlinear system will converge at least as fast as the linear system represented by the straight line. I will graphically illustrate unstable fixed points with open circles and stable fixed points (i.s.L.) with filled circles.

Next, we need to consider what happens to initial conditions which begin farther from the fixed points. If we think of the dynamics of the system as a flow on the $x$-axis, then we know that anytime $\dot{x} > 0$, the flow is moving to the right, and $\dot{x} < 0$, the flow is moving to the left. If we further annotate our graph with arrows indicating the direction of the flow, then the entire (long-term) system behavior becomes clear:

For instance, we can see that any initial condition $x(0) \in (-\pi,\pi)$ will result in $\lim_{t\rightarrow \infty} x(t) = 0$. This region is called the basin of attraction of the fixed point at $x^* = 0$. Basins of attraction of two fixed points cannot overlap, and the manifold separating two basins of attraction is called the separatrix. Here the unstable fixed points, at $x^* = \{.., -\pi, \pi, 3\pi, ...\}$ form the separatrix between the basins of attraction of the stable fixed points.

As these plots demonstrate, the behavior of a first-order one dimensional system on a line is relatively constrained. The system will either monotonically approach a fixed-point or monotonically move toward $\pm \infty$. There are no other possibilities. Oscillations, for example, are impossible. Graphical analysis is a fantastic analysis tool for many first-order nonlinear systems (not just pendula); as illustrated by the following example:

# Nonlinear autapse

Consider the following system: $$\dot{x} + x = \tanh(w x),$$ which is plotted below for two values of $w$. It's convenient to note that $\tanh(z) \approx z$ for small $z$. For $w \le 1$ the system has only a single fixed point. For $w > 1$ the system has three fixed points : two stable and one unstable. These equations are not arbitrary - they are actually a model for one of the simplest neural networks, and one of the simplest model of persistent memorySeung00. In the equation $x$ models the firing rate of a single neuron, which has a feedback connection to itself. $\tanh$ is the activation (sigmoidal) function of the neuron, and $w$ is the weight of the synaptic feedback.

One last piece of terminology. In the neuron example, and in many dynamical systems, the dynamics were parameterized; in this case by a single parameter, $w$. As we varied $w$, the fixed points of the system moved around. In fact, if we increase $w$ through $w=1$, something dramatic happens - the system goes from having one fixed point to having three fixed points. This is called a bifurcation. This particular bifurcation is called a pitchfork bifurcation. We often draw bifurcation diagrams which plot the fixed points of the system as a function of the parameters, with solid lines indicating stable fixed points and dashed lines indicating unstable fixed points, as seen in the figure:

Our pendulum equations also have a (saddle-node) bifurcation when we change the constant torque input, $u_0$. Finally, let's return to the original equations in $\theta$, instead of in $x$. Only one point to make: because of the wrap-around, this system will appear have oscillations. In fact, the graphical analysis reveals that the pendulum will turn forever whenever $|u_0| > \frac{mgl}{b}$, but now you understand that this is not an oscillation, but an instability with $\theta \rightarrow \pm \infty$.

# The undamped pendulum with zero torque

Consider again the system $$ml^2 \ddot\theta = u_0 - mgl \sin\theta - b\dot\theta,$$ this time with $b = 0$. This time the system dynamics are truly second-order. We can always think of any second-order system as (coupled) first-order system with twice as many variables. Consider a general, autonomous (not dependent on time), second-order system, $$\ddot{q} = f(q,\dot q,u).$$ This system is equivalent to the two-dimensional first-order system \begin{align*} \dot x_1 =& x_2 \\ \dot x_2 =& f(x_1,x_2,u), \end{align*} where $x_1 = q$ and $x_2 = \dot q$. Therefore, the graphical depiction of this system is not a line, but a vector field where the vectors $[\dot x_1, \dot x_2]^T$ are plotted over the domain $(x_1,x_2)$. This vector field is known as the phase portrait of the system.

In this section we restrict ourselves to the simplest case when $u_0 = 0$. Let's sketch the phase portrait. First sketch along the $\theta$-axis. The $x$-component of the vector field here is zero, the $y$-component is $-mgl\sin\theta.$ As expected, we have fixed points at $\pm \pi, ...$ Now sketch the rest of the vector field. Can you tell me which fixed points are stable? Some of them are stable i.s.L., none are asymptotically stable.

# Orbit calculations

You might wonder how we drew the black contour lines in the figure above. We could have obtained them by simulating the system numerically, but those lines can be easily obtained in closed-form. Directly integrating the equations of motion is difficult, but at least for the case when $u_0 = 0$, we have some additional physical insight for this problem that we can take advantage of. The kinetic energy, $T$, and potential energy, $U$, of the pendulum are given by $$T = \frac{1}{2}I\dot\theta^2, \quad U = -mgl\cos(\theta),$$ where $I=ml^2$ and the total energy is $E(\theta,\dot\theta) = T(\dot\theta)+U(\theta)$. The undamped pendulum is a conservative system: total energy is a constant over system trajectories. Using conservation of energy, we have: \begin{gather*} E(\theta(t),\dot\theta(t)) = E(\theta(0),\dot\theta(0)) = E_0 \\ \frac{1}{2} I \dot\theta^2(t) - mgl\cos(\theta(t)) = E_0 \\ \dot\theta(t) = \pm \sqrt{\frac{2}{I}\left[E_0 + mgl\cos\left(\theta(t)\right)\right]} \end{gather*} Using this, if you tell me $\theta$ I can determine one of two possible values for $\dot\theta$, and the solution has all of the richness of the black countour lines from the plot. This equation has a real solution when $\cos(\theta) > \cos(\theta_{max})$, where $$\theta_{max} = \begin{cases} \cos^{-1}\left( \frac{E}{mgl} \right), & E < mgl \\ \pi, & \text{otherwise}. \end{cases}$$ Of course this is just the intuitive notion that the pendulum will not swing above the height where the total energy equals the potential energy. As an exercise, you can verify that differentiating this equation with respect to time indeed results in the equations of motion.

# The undamped pendulum with a constant torque

Now what happens if we add a constant torque? If you visualize the bifurcation diagram, you'll see that the fixed points come together, towards $q = \frac{\pi}{2}, \frac{5\pi}{2}, ...$, until they disappear. One fixed-point is unstable, and one is stable.

Before we continue, let me now give you the promised example of a system that is asymptotically stable, but not stable i.s.L.. We can accomplish this with a very pendulum-like example (written here in polar coordinates):

# Asymptotically stable, but not Lyapunov stable

The system $\dot{r} = r(1-r) \\ \dot\theta = \sin^2 (\frac{\theta}{2})$ is asymptotically stable to $x^* = [1,0]^T$, but is not stable i.s.L.

Take a minute to draw the vector field of this (you can draw each coordinate independently, if it helps) to make sure you understand. Note that to wrap-around rotation is convenient but not essential -- we could have written the same dynamical system in cartesian coordinates without wrapping. And if this feel too arbitrary, we will see it happen in practice when we introduce the energy-shaping swing-up controller for the pendulum in the next chapter.

# The damped pendulum

Now let's add damping back. You can still add torque to move the fixed points (in the same way).

With damping, the downright fixed point of the pendulum now becomes asymptotically stable (in addition to stable i.s.L.). Is it also exponentially stable? How can we tell? One technique is to linearize the system at the fixed point. A smooth, time-invariant, nonlinear system that is locally exponentially stable must have a stable linearization; we'll discuss linearization more in the next chapter.

Here's a thought exercise. If $u$ is no longer a constant, but a function $\pi(\theta,\dot{\theta})$, then how would you choose $\pi$ to stabilize the vertical position. Feedback linearization is the trivial solution, for example: $$u = \pi(\theta,\dot{\theta}) = 2mgl\sin\theta.$$ But these plots we've been making tell a different story. How would you shape the natural dynamics - at each point pick a $u$ from the stack of phase plots - to stabilize the vertical fixed point with minimal torque effort? This is exactly the way that I would like you to think about control system design. And we'll give you your first solution techniques -- using dynamic programming -- in the next lecture.

# The torque-limited simple pendulum

The simple pendulum is fully actuated. Given enough torque, we can produce any number of control solutions to stabilize the originally unstable fixed point at the top (such as designing a feedback controller to effectively invert gravity).

The problem begins to get interesting (a.k.a. becomes underactuated) if we impose a torque-limit constraint, $|u|\le u_{max}$. Looking at the phase portraits again, you can now visualize the control problem. Via feedback, you are allowed to change the direction of the vector field at each point, but only by a fixed amount. Clearly, if the maximum torque is small (smaller than $mgl$), then there are some states which cannot be driven directly to the goal, but must pump up energy to reach the goal. Furthermore, if the torque-limit is too severe and the system has damping, then it may be impossible to swing up to the top. The existence of a solution, and number of pumps required to reach the top, is a non-trivial function of the initial conditions and the torque-limits.

Although this system is very simple, its solution requires much of the same reasoning necessary for controlling much more complex underactuated systems; this problem will be a work-horse for us as we introduce new algorithms throughout this book.

A great deal of work in the control of underactuated systems has been done in the context of low-dimensional model systems. These model systems capture the essence of the problem without introducing all of the complexity that is often involved in more real-world examples. In this chapter we will focus on two of the most well-known and well-studied model systems--the Acrobot and the Cart-Pole. After we have developed some tools, we will see that they can be applied directly to other model systems; we will give a number of examples using Quadrotors. All of these systems are trivially underactuated, having less actuators than degrees of freedom.

# The Acrobot

The Acrobot is a planar two-link robotic arm in the vertical plane (working against gravity), with an actuator at the elbow, but no actuator at the shoulder. It was first described in detail in Murray91. The companion system, with an actuator at the shoulder but not at the elbow, is known as the PendubotSpong97. The Acrobot is so named because of its resemblance to a gymnast (or acrobat) on a parallel bar, who controls his/her motion predominantly by effort at the waist (and not effort at the wrist). The most common control task studied for the Acrobot is the swing-up task, in which the system must use the elbow (or waist) torque to move the system into a vertical configuration then balance.

The Acrobot is representative of the primary challenge in underactuated robots. In order to swing up and balance the entire system, the controller must reason about and exploit the state-dependent coupling between the actuated degree of freedom and the unactuated degree of freedom. It is also an important system because, as we will see, it closely resembles one of the simplest models of a walking robot.

# Equations of motion

Figure 3.1 illustrates the model parameters used in our analysis. $\theta_1$ is the shoulder joint angle, $\theta_2$ is the elbow (relative) joint angle, and we will use $\bq = [\theta_1,\theta_2]^T$, $\bx = [\bq,\dot\bq]^T$. The zero configuration is with both links pointed directly down. The moments of inertia, $I_1,I_2$ are taken about the pivots. The task is to stabilize the unstable fixed point $\bx = [\pi,0,0,0]^T$.

We will derive the equations of motion for the Acrobot using the method of Lagrange. The kinematics are given by: $$\bx_1 = \begin{bmatrix} l_1 s_1 \\ -l_1 c_1 \end{bmatrix}, \quad \bx_2 = \bx_1 + \begin{bmatrix} l_2 s_{1+2} \\ - l_2 c_{1+2} \end{bmatrix} .$$ The energy is given by: \begin{gather} T = T_1 + T_2, \quad T_1 = \frac{1}{2} I_1 \dot{q}_1^2 \\ T_2 = \frac{1}{2} ( m_2 l_1^2 + I_2 + 2 m_2 l_1 l_{c2} c_2 ) \dot{q}_1^2 + \frac{1}{2} I_2 \dot{q}_2^2 + (I_2 + m_2 l_1 l_{c2} c_2 ) \dot{q}_1 \dot{q}_2 \\ % from expanding sum of point masses U = -m_1 g l_{c1} c_1 - m_2 g (l_1 c_1 + l_{c2} c_{1+2}) \end{gather} Entering these quantities into the Lagrangian yields the equations of motion: \begin{gather} (I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2) \ddot{q}_1 + (I_2 + m_2 l_1 l_{c2} c_2)\ddot{q}_2 - 2m_2 l_1 l_{c2} s_2 \dot{q}_1 \dot{q}_2 \\ \quad -m_2 l_1 l_{c2} s_2 \dot{q}_2^2 + m_1 g l_{c1}s_1 + m_2 g (l_1 s_1 + l_{c2} s_{1+2}) = 0 \\ (I_2 + m_2 l_1 l_{c2} c_2) \ddot{q}_1 + I_2 \ddot{q}_2 + m_2 l_1 l_{c2} s_2 \dot{q}_1^2 + m_2 g l_{c2} s_{1+2} = \tau \end{gather} In standard, manipulator equation form: $$\bM(\bq)\ddot\bq + \bC(\bq,\dot\bq)\dot\bq = \btau_g(\bq) + \bB\bu,$$ using $\bq = [\theta_1,\theta_2]^T$, $\bu = \tau$ we have: \begin{gather} \bM(\bq) = \begin{bmatrix} I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2 & I_2 + m_2 l_1 l_{c2} c_2 \\ I_2 + m_2 l_1 l_{c2} c_2 & I_2 \end{bmatrix},\label{eq:Hacrobot}\\ \bC(\bq,\dot{\bq}) = \begin{bmatrix} -2 m_2 l_1 l_{c2} s_2 \dot{q}_2 & -m_2 l_1 l_{c2} s_2 \dot{q}_2 \\ m_2 l_1 l_{c2} s_2 \dot{q}_1 & 0 \end{bmatrix}, \\ \btau_g(\bq) = \begin{bmatrix} -m_1 g l_{c1}s_1 - m_2 g (l_1 s_1 + l_{c2}s_{1+2}) \\ -m_2 g l_{c2} s_{1+2} \end{bmatrix}, \quad \bB = \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{gather}

# The Acrobot in Python

You can experiment with the Acrobot dynamics in using, e.g.

acrobot/torque_slider_demo.py

# The Cart-Pole system

The other model system that we will investigate here is the cart-pole system, in which the task is to balance a simple pendulum around its unstable equilibrium, using only horizontal forces on the cart. Balancing the cart-pole system is used in many introductory courses in control, including 6.003 at MIT, because it can be accomplished with simple linear control (e.g. pole placement) techniques. In this chapter we will consider the full swing-up and balance control problem, which requires a full nonlinear control treatment.

The figure shows our parameterization of the system. $x$ is the horizontal position of the cart, $\theta$ is the counter-clockwise angle of the pendulum (zero is hanging straight down). We will use $\bq = [x,\theta]^T$, and $\bx = [\bq,\dot\bq]^T$. The task is to stabilize the unstable fixed point at $\bx = [0,\pi,0,0]^T.$

# Equations of motion

The kinematics of the system are given by $$\bx_1 = \begin{bmatrix} x \\ 0 \end{bmatrix}, \quad \bx_2 = \begin{bmatrix} x + l\sin\theta \\ -l\cos\theta \end{bmatrix}.$$ The energy is given by \begin{align} T=& \frac{1}{2} (m_c + m_p)\dot{x}^2 + m_p \dot{x}\dot\theta l \cos{\theta} + \frac{1}{2}m_p l^2 \dot\theta^2 \\ U =& -m_p g l \cos\theta. \end{align} The Lagrangian yields the equations of motion: \begin{gather} (m_c + m_p)\ddot{x} + m_p l \ddot\theta \cos\theta - m_p l \dot\theta^2 \sin\theta = f \\ m_p l \ddot{x} \cos\theta + m_p l^2 \ddot\theta + m_p g l \sin\theta = 0 \end{gather} In standard, manipulator equation form: $$\bM(\bq)\ddot\bq + \bC(\bq,\dot\bq)\dot\bq = \btau_g(\bq) + \bB\bu,$$ using $\bq = [x,\theta]^T$, $\bu = f$, we have: \begin{gather*} \bM(\bq) = \begin{bmatrix} m_c + m_p & m_p l \cos\theta \\ m_p l \cos\theta & m_p l^2 \end{bmatrix}, \quad \bC(\bq,\dot{\bq}) = \begin{bmatrix} 0 & -m_p l \dot\theta \sin\theta \\ 0 & 0 \end{bmatrix}, \\ \btau_g(\bq) = \begin{bmatrix} 0 \\ - m_p gl \sin\theta \end{bmatrix}, \quad \bB = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{gather*} In this case, it is particularly easy to solve directly for the accelerations: \begin{align} \ddot{x} =& \frac{1}{m_c + m_p \sin^2\theta}\left[ f+m_p \sin\theta (l \dot\theta^2 + g\cos\theta)\right] \label{eq:ddot_x}\\ \ddot{\theta} =& \frac{1}{l(m_c + m_p \sin^2\theta)} \left[ -f \cos\theta - m_p l \dot\theta^2 \cos\theta \sin\theta - (m_c + m_p) g \sin\theta \right] \label{eq:ddot_theta} \end{align} In some of the analysis that follows, we will study the form of the equations of motion, ignoring the details, by arbitrarily setting all constants to 1: \begin{gather} 2\ddot{x} + \ddot\theta \cos\theta - \dot\theta^2 \sin\theta = f \label{eq:simple}\\ \ddot{x}\cos\theta + \ddot\theta + \sin\theta = 0. \label{eq:simple2} \end{gather}

# The Cart-Pole System in Python

You can experiment with the Cart-Pole dynamics in using, e.g.

cartpole/force_slider_demo.py

# Balancing

For both the Acrobot and the Cart-Pole systems, we will begin by designing a linear controller which can balance the system when it begins in the vicinity of the unstable fixed point. To accomplish this, we will linearize the nonlinear equations about the fixed point, examine the controllability of this linear system, then using linear quadratic regulator (LQR) theory to design our feedback controller.

# Linearizing the manipulator equations

Although the equations of motion of both of these model systems are relatively tractable, the forward dynamics still involve quite a few nonlinear terms that must be considered in any linearization. Let's consider the general problem of linearizing a system described by the manipulator equations.

We can perform linearization around a fixed point, $(\bx^*, \bu^*)$, using a Taylor expansion: $$\dot\bx = {\bf f}(\bx,\bu) \approx {\bf f}(\bx^*,\bu^*) + \left[ \pd{\bf f}{\bx}\right]_{\bx=\bx^*,\bu=\bu^*} (\bx - \bx^*) + \left[ \pd{\bf f}{\bu}\right]_{\bx=\bx^*,\bu=\bu^*} (\bu - \bu^*)$$ Let us consider the specific problem of linearizing the manipulator equations around a (stable or unstable) fixed point. In this case, ${\bf f}(\bx^*,\bu^*)$ is zero, and we are left with the standard linear state-space form: \begin{align} \dot\bx =& \begin{bmatrix} \dot\bq \\ \bM^{-1}(\bq) \left[ \btau_g(\bq) + {\bf B}(\bq)\bu - \bC(\bq,\dot\bq)\dot\bq \right] \end{bmatrix},\\ \approx& {\bf A}_{lin} (\bx-\bx^*) + \bB_{lin} (\bu - \bu^*), \end{align} where ${\bf A}_{lin}$, and $\bB_{lin}$ are constant matrices. Let us define $\bar\bx = \bx - \bx^*, \bar\bu = \bu - \bu^*$, and write $$\dot{\bar\bx} = {\bf A}_{lin}\bar\bx + \bB_{lin}\bar\bu.$$ Evaluation of the Taylor expansion around a fixed point yields the following, very simple equations, given in block form by: \begin{align} {\bf A}_{lin} =& \begin{bmatrix} {\bf 0} & {\bf I} \\ \bM^{-1} \pd{\btau_g}{\bq} + \sum_{j} \bM^{-1}\pd{\bB_j}{\bq} u_j & {\bf 0} \end{bmatrix}_{\bx=\bx^*,\bu=\bu^*} \\ \bB_{lin} =& \begin{bmatrix} {\bf 0} \\ \bM^{-1} \bB \end{bmatrix}_{\bx=\bx^*, \bu=\bu^*} \end{align} where $\bB_j$ is the $j$th column of $\bB$. Note that the term involving $\pd{\bM^{-1}}{q_i}$ disappears because $\btau_g + \bB\bu - \bC\dot{\bq}$ must be zero at the fixed point. All of the $\bC\dot\bq$ derivatives drop out, too, because $\dot{\bq}^* = 0$, and any terms with $\bC$ drop out as well, since centripetal and centrifugal forces are zero at when velocity is zero. In many cases, including both the Acrobot and Cart-Pole systems (but not the Quadrotors), the matrix $\bB(\bq)$ is a constant, so the $\pd\bB{\bq}$ terms also drop out.

# Linearization of the Acrobot

Linearizing around the (unstable) upright point, we have: \begin{gather} \left[\pd{\bf \tau_g}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} g (m_1 l_{c1} + m_2 l_1 + m_2 l_{c2}) & m_2 g l_{c2} \\ m_2 g l_{c2} & m_2 g l_{c2} \end{bmatrix} \end{gather} The linear dynamics follow directly from these equations and the manipulator form of the Acrobot equations.

# Linearization of the Cart-Pole System

Linearizing around the unstable fixed point in this system, we have: \begin{gather} \left[\pd{\btau_g}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} 0 & 0 \\ 0 & m_p g l \end{bmatrix} \end{gather} Again, the linear dynamics follow simply.

Studying the properties of the linearized system can tell us some things about the (local) properties of the nonlinear system. For instance, having a stable linearization implies local exponential stability of the nonlinear systemSlotine90+Hauser94a. It's worth noting that having an unstable linearization also implies that the system is locally unstable, but if the linearization is marginally stable then one cannot conclude whether the nonlinear system is asymptotically stable, stable i.s.L., or unstable (only that it is not exponentially stable)Slotine90.

# Controllability

A control system of the form $$\dot{\bx} = {\bf f}(\bx,\bu)$$ is called controllable if it is possible to construct an unconstrained input signal, $\bu(t)$, $t \in [0,t_f],$ which will move the system from any initial state to any final state in a finite interval of time, $0 < t < t_f$ Ogata96.

For the linear system $$\dot{\bx} = {\bf A}\bx + \bB\bu,$$ we can integrate this linear system in closed form, so it is possible to derive the exact conditions of controllability. In particular, for linear systems it is sufficient to demonstrate that there exists a control input which drives any initial condition to the origin.

# The special case of non-repeated eigenvalues

Let us first examine a special case, which falls short as a general tool but may be more useful for understanding the intuition of controllability. Let's perform an eigenvalue analysis of the system matrix ${\bf A}$, so that: $${\bf A}{\bf v}_i = \lambda_i {\bf v}_i,$$ where $\lambda_i$ is the $i$th eigenvalue, and ${\bf v}_i$ is the corresponding (right) eigenvector. There will be $n$ eigenvalues for the $n \times n$ matrix ${\bf A}$. Collecting the (column) eigenvectors into the matrix ${\bf V}$ and the eigenvalues into a diagonal matrix ${\bf \Lambda}$, we have $${\bf A}{\bf V} = {\bf V}{\bf \Lambda}.$$ Here comes our primary assumption: let us assume that each of these $n$ eigenvalues takes on a distinct value (no repeats). With this assumption, it can be shown that the eigenvectors ${\bf v}_i$ form a linearly independent basis set, and therefore ${\bf V}^{-1}$ is well-defined.

We can continue our eigenmodal analysis of the linear system by defining the modal coordinates, ${\bf r}$, with: $$\bx = {\bf V}{\bf r},\quad \text{or}\quad {\bf r} = {\bf V}^{-1}\bx.$$ In modal coordinates, the dynamics of the linear system are given by $$\dot{\bf r} = {\bf V}^{-1} {\bf A} {\bf V} {\bf r} + {\bf V}^{-1} \bB \bu = {\bf \Lambda} {\bf r} + {\bf V}^{-1}\bB \bu.$$ This illustrates the power of modal analysis; in modal coordinates, the dynamics diagonalize yielding independent linear equations: $$\dot{r}_i = \lambda_i r_i + \sum_j \beta_{ij} u_j,\quad {\bf \beta} = {\bf V}^{-1} \bB.$$

Now the concept of controllability becomes clear. Input $j$ can influence the dynamics in modal coordinate $i$ if and only if ${\bf \beta}_{ij} \neq 0$. In the special case of non-repeated eigenvalues, having control over each individual eigenmode is sufficient to (in finite time) regulate all of the eigenmodesOgata96. Therefore, we say that the system is controllable if and only if $$\forall i, \exists j \text{ such that }\beta_{ij} \neq 0.$$

# A general solution

Included only for completeness. Click to expand the details.

A more general solution to the controllability issue, which removes our assumption about the eigenvalues, can be obtained by examining the time-domain solution of the linear equations. The solution of this system is $$\bx(t) = e^{{\bf A}t} \bx(0) + \int_0^{t} e^{{\bf A}(t - \tau)} \bB \bu(\tau) d\tau.$$ Without loss of generality, lets consider the that the final state of the system is zero. Then we have: $$\bx(0) = - \int_0^{t_f} e^{-{\bf A}\tau}\bB \bu(\tau) d\tau.$$ You might be wondering what we mean by $e^{{\bf A}t}$; a scalar raised to the power of a matrix..? Recall that $e^{z}$ is actually defined by a convergent infinite sum: $$e^{z} =1 + z + \frac{1}{2} z^2 + \frac{1}{6} z^3 + ... .$$ The notation $e^{{\bf A}t}$ uses the same definition: $$e^{{\bf A}t} = {\bf I} + {\bf A}t + \frac{1}{2}({\bf A}t)^2 + \frac{1}{6}({\bf A}t)^3 + ... .$$ Not surprisingly, this has many special forms. For instance, if ${\bf A}$ is diagonalizable, $e^{{\bf A}t} = {\bf V}e^{{\bf\Lambda}t}{\bf V}^{-1},$ where ${\bf A} = {\bf V \Lambda V}^{-1}$ is the eigenvalue decomposition of ${\bf A}$ Strang98. The particular form we will use here is $$e^{-{\bf A}\tau} = \sum_{k=0}^{n-1} \alpha_k(\tau) {\bf A}^k.$$ This is a particularly surprising form, because the infinite sum above is represented by this finite sum; the derivation uses Sylvester's Theorem Ogata96, Chen98a. Then we have, \begin{align*} \bx(0) =& - \sum_{k=0}^{n-1} {\bf A}^k \bB \int_0^{t_f} \alpha_k(\tau) \bu(\tau) d\tau \\ =& -\sum_{k=0}^{n-1} {\bf A}^k \bB w_k \text{, where } \bw_k = \int_0^{t_f} \alpha_k(\tau) \bu(\tau) d\tau \\ =& - \begin{bmatrix} \bB & {\bf AB} & {\bf A}^2\bB & \cdots & {\bf A}^{n-1}\bB \end{bmatrix}_{n \times (nm)} \begin{bmatrix} \bw_0 \\ \bw_1 \\ \bw_2 \\ \vdots \\ \bw_{n-1} \end{bmatrix} \end{align*} The matrix containing the vectors $\bB$, ${\bf AB}$, ... ${\bf A}^{n-1}\bB$ is called the controllability matrix. In order for the system to be controllable, for every initial condition $\bx(0)$ we must be able to find the corresponding vector ${\bf w}$. This is only possible when the rows of the controllability matrix are linearly independent. Therefore, the condition of controllability is that this controllability matrix is full rank.

Although we only treated the case of a scalar $u$, it is possible to extend the analysis to a vector $\bu$ of size $m$, yielding the condition $$\text{rank} \begin{bmatrix} \bB & {\bf AB} & {\bf A}^2\bB & \cdots & {\bf A}^{n-1}\bB \end{bmatrix}_{n \times (nm)} = n.$$ In Matlab (using the Control Systems Toolbox), you can obtain the controllability matrix using Cm = ctrb(A,B), and evaluate its rank with rank(Cm).

Note that a linear feedback to change the eigenvalues of the eigenmodes is not sufficient to accomplish our goal of getting to the goal in finite time. In fact, the open-loop control to reach the goal is easily obtained with a final-value LQR problem, and (for ${\bf R}={\bf I}$) is actually a simple function of the controllability GrammianChen98a.

# Controllability vs. underactuated

Analysis of the controllability of both the Acrobot and Cart-Pole systems reveals that the linearized dynamics about the upright are, in fact, controllable. This implies that the linearized system, if started away from the zero state, can be returned to the zero state in finite time. This is potentially surprising - after all the systems are underactuated. For example, it is interesting and surprising that the Acrobot can balance itself in the upright position without having a shoulder motor.

The controllability of these model systems demonstrates an extremely important, point: An underactuated system is not necessarily an uncontrollable system. Underactuated systems cannot follow arbitrary trajectories, but that does not imply that they cannot arrive at arbitrary points in state space. However, the trajectory required to place the system into a particular state may be arbitrarily complex.

The controllability analysis presented here is for linear time-invariant (LTI) systems. A comparable analysis exists for linear time-varying (LTV) systems. We will even see extensions to nonlinear systems; although it will often be referred to by the synonym of "reachability" analysis.

# LQR feedback

Controllability tells us that a trajectory to the fixed point exists, but does not tell us which one we should take or what control inputs cause it to occur. Why not? There are potentially infinitely many solutions. We have to pick one.

The tools for controller design in linear systems are very advanced. In particular, as we describe in the linear optimal control chapter, one can easily design an optimal feedback controller for a regulation task like balancing, so long as we are willing to linearize the system around the operating point and define optimality in terms of a quadratic cost function: $$J(\bx_0) = \int_0^\infty \left[ \bx^T(t) \bQ \bx(t) + \bu^T(t) \bR \bu(t) \right]dt, \quad \bx(0)=\bx_0, \bQ=\bQ^T>0, \bR=\bR^T>0.$$ The linear feedback matrix $\bK$ used as $$\bu(t) = - \bK\bx(t),$$ is the so-called optimal linear quadratic regulator (LQR). Even without understanding the detailed derivation, we can quickly become practitioners of LQR. provides a function, K = LinearQuadraticRegulator(system,context,Q,R) for linear systems. It also provides a version controller = LinearQuadraticRegulator(system,context,Q,R) that will linearize the system for you around an equilibrium and return the controller (in the original coordinates). Therefore, to use LQR, one simply needs to define the symmetric positive-definite cost matrices, ${\bf Q}$ and ${\bf R}$. In their most common form, ${\bf Q}$ and ${\bf R}$ are positive diagonal matrices, where the entries $Q_{ii}$ penalize the relative errors in state variable $x_i$ compared to the other state variables, and the entries $R_{ii}$ penalize actions in $u_i$.

# LQR for the Acrobot and Cart-Pole

Take a minute to play around with the LQR controller for the Acrobot

acrobot/balancing_lqr.py

and Cart-Pole

cartpole/balancing_lqr.py

Make sure that you take a minute to look at the code which runs during these examples. Can you set the ${\bf Q}$ and ${\bf R}$ matrices differently, to improve the performance?

Simulation of the closed-loop response with LQR feedback shows that the task is indeed completed - and in an impressive manner. Oftentimes the state of the system has to move violently away from the origin in order to ultimately reach the origin. Further inspection reveals the (linearized) closed-loop dynamics are in fact non-minimum phase (acrobot has 3 right-half zeros, cart-pole has 1).

LQR works essentially out of the box for Quadrotors, if linearized around a nominal fixed point (where the non-zero thrust from the propellers is balancing gravity). We have code for the standard Quadrotor model in , and for stabilizing it with LQR. I'll include them again here as soon as I figure out how to distribute a 3D visualizer with the binary releases of .

Note that LQR, although it is optimal for the linearized system, is not necessarily the best linear control solution for maximizing basin of attraction of the fixed-point. The theory of robust control(e.g., Zhou97), which explicitly takes into account the differences between the linearized model and the nonlinear model, will produce controllers which outperform our LQR solution in this regard.

# Partial feedback linearization

In the introductory chapters, we made the point that the underactuated systems are not feedback linearizable. At least not completely. Although we cannot linearize the full dynamics of the system, it is still possible to linearize a portion of the system dynamics. The technique is called partial feedback linearization.

Consider the cart-pole example. The dynamics of the cart are affected by the motions of the pendulum. If we know the model, then it seems quite reasonable to think that we could create a feedback controller which would push the cart in exactly the way necessary to counter-act the dynamic contributions from the pendulum - thereby linearizing the cart dynamics. What we will see, which is potentially more surprising, is that we can also use a feedback controller for the cart to feedback linearize the dynamics of the passive pendulum joint.

We'll use the term collocated partial feedback linearization to describe a controller which linearizes the dynamics of the actuated joints. What's more surprising is that it is often possible to achieve non-collocated partial feedback linearization - a controller which linearizes the dynamics of the unactuated joints. The treatment presented here follows from Spong94a.

# Collocated

Starting from the equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot\theta = -\ddot{x}c - s \\ % \ddot\theta = -\frac{1}{l} (\ddot{x} c + g s) \ddot{x}(2-c^2) - sc - \dot\theta^2 s = f \end{gather*} Therefore, applying the feedback control $$f = (2 - c^2) \ddot{x}^d - sc - \dot\theta^2 s % f = (m_c + m_p) u + m_p (u c + g s) c - m_p l \dot\theta^2 s$$ results in \begin{align*} \ddot{x} =& \ddot{x}^d \\ \ddot{\theta} =& -\ddot{x}^dc - s, \end{align*} which are valid globally.

# Non-collocated

Starting again from equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot{x} = -\frac{\ddot\theta + s}{c} \\ \ddot\theta(c - \frac{2}{c}) - 2 \tan\theta - \dot\theta^2 s = f \end{gather*} Applying the feedback control $$f = (c - \frac{2}{c}) \ddot\theta^d - 2 \tan\theta - \dot\theta^2 s$$ results in \begin{align*} \ddot\theta =& \ddot\theta^d \\ \ddot{x} =& -\frac{1}{c} \ddot\theta^d - \tan\theta. \end{align*} Note that this expression is only valid when $\cos\theta \neq 0$. This is not surprising, as we know that the force cannot create a torque when the beam is perfectly horizontal.

# General form

For systems that are trivially underactuated (torques on some joints, no torques on other joints), we can, without loss of generality, reorganize the joint coordinates in any underactuated system described by the manipulator equations into the form: \begin{align} \bM_{11} \ddot{\bq}_1 + \bM_{12} \ddot{\bq}_2 &= \btau_1, \label{eq:passive_dyn}\\ \bM_{21} \ddot{\bq}_1 + \bM_{22} \ddot{\bq}_2 &= \btau_2 + \bu, \label{eq:active_dyn} \end{align} with $\bq \in \Re^n$, $\bq_1 \in \Re^l$, $\bq_2 \in \Re^m$, $l=n-m$. $\bq_1$ represents all of the passive joints, and $\bq_2$ represents all of the actuated joints, and the $\btau = \btau_g - \bC\dot\bq$ terms capture all of the Coriolis and gravitational terms, and $$\bM(\bq) = \begin{bmatrix} \bM_{11} & \bM_{12} \\ \bM_{21} & \bM_{22} \end{bmatrix}.$$ Fortunately, because $\bM$ is uniformly (e.g. for all $\bq$) positive definite, $\bM_{11}$ and $\bM_{22}$ are also positive definite.

# Collocated linearization

Performing the same substitutions into the full manipulator equations, we get: \begin{gather} \ddot\bq_1 = \bM_{11}^{-1} \left[ \btau_1 - \bM_{12} \ddot\bq_2 \right] \\ (\bM_{22} - \bM_{21} \bM_{11}^{-1} \bM_{12}) \ddot\bq_2 - \btau_2 + \bM_{21} \bM_{11}^{-1} \btau_1 = \bu \end{gather} It can be easily shown that the matrix $(\bM_{22} - \bM_{21} \bM_{11}^{-1} \bM_{12})$ is invertibleSpong94a; we can see from inspection that it is symmetric. PFL follows naturally, and is valid globally.

# Non-collocated linearization

\begin{gather} \ddot\bq_2 = \bM_{12}^+ \left[ \btau_1 - \bM_{11} \ddot\bq_1 \right] \\ (\bM_{21} - \bM_{22} \bM_{12}^+ \bM_{11}) \ddot\bq_1 - \btau_2 + \bM_{22} \bM_{12}^+ \btau_1 = \bu \end{gather} where $\bM_{12}^+$ is a Moore-Penrose pseudo-inverse. This inverse provides a unique solution when the rank of $\bM_{12}$ equals $l$, the number of passive degrees of freedom in the system (it cannot be more, since the matrix only has $l$ rows). This rank condition is sometimes called the property of "Strong Inertial Coupling". It is state dependent. A system has Global Strong Inertial Coupling if it exhibits Strong Inertial Coupling in every state.

In general, we can define some combination of active and passive joints that we would like to control. This combination is sometimes called a "task space". cite some task space refs here? Consider an output function of the form, $$\by = \bh(\bq),$$ with ${\bf y} \in \Re^p$, which defines the task space. Define $\bH_1 = \frac{\partial \bh}{\partial \bq_1}$, $\bH_2 = \frac{\partial \bh}{\partial \bq_2}$, $\bH = [\bH_1,\bH_2]$.

If the actuated joints are commanded so that $$\ddot\bq_2 = \bar\bH^+ \left [\ddot\by^d - \dot\bH\dot\bq - \bH_1 \bM_{11}^{-1}\btau_1 \right], \label{eq:q2cmd}$$ where $\bar{\bH} = \bH_2 - \bH_1 \bM_{11}^{-1} \bM_{12}.$ and $\bar\bH^+$ is the right Moore-Penrose pseudo-inverse, $$\bar\bH^+ = \bar\bH^T (\bar\bH \bar\bH^T)^{-1},$$ then we have $$\ddot\by = \ddot\by^d.$$ subject to $$\text{rank}\left(\bar{\bH} \right) = p, \label{eq:rank_condition}$$

Proof Sketch. Differentiating the output function we have \begin{gather*} \dot\by = \bH \dot\bq \\ \ddot\by = \dot\bH \dot\bq + \bH_1 \ddot\bq_1 + \bH_2 \ddot\bq_2. \end{gather*} Solving \ref{eq:passive_dyn} for the dynamics of the unactuated joints we have: $$\ddot\bq_1 = \bM_{11}^{-1} (\btau_1 - \bM_{12} \ddot\bq_2) \label{eq:q1cmd}$$ Substituting, we have \begin{align} \ddot\by =& \dot\bH \dot\bq + \bH_1 \bM_{11}^{-1}(\btau_1 - \bM_{12}\ddot\bq_2) + \bH_2 \ddot\bq_2 \\ =& \dot\bH \dot\bq + \bar{\bH} \ddot\bq_2 + \bH_1 \bM_{11}^{-1}\btau_1 \\ =& \ddot\by^d \end{align} Note that the last line required the rank condition ($\ref{eq:rank_condition}$) on $\bar\bH$ to ensure that the rows of $\bar{\bH}$ are linearly independent, allowing $\bar\bH \bar\bH^+ = \bI$.

In order to execute a task space trajectory one could command $$\ddot\by^d = \ddot{\bar{\by}}^d + \bK_d (\dot{\bar{\by}}^d - \dot\by) + \bK_p (\bar{\by}^d -\by).$$ Assuming the internal dynamics are stable, this yields converging error dynamics, $(\bar{\by}^d - \by)$, when $\bK_p,\bK_d > 0$Slotine90. For a position control robot, the acceleration command of ($\ref{eq:q2cmd}$) suffices. Alternatively, a torque command follows by substituting ($\ref{eq:q2cmd}$) and ($\ref{eq:q1cmd}$) into ($\ref{eq:active_dyn}$).

# End-point trajectory following with the Cart-Pole system

Consider the task of trying to track a desired kinematic trajectory with the endpoint of pendulum in the cart-pole system. With one actuator and kinematic constraints, we might be hard-pressed to track a trajectory in both the horizontal and vertical coordinates. But we can at least try to track a trajectory in the vertical position of the end-effector.

Using the task-space PFL derivation, we have: \begin{gather*} y = h(\bq) = -l \cos\theta \\ \dot{y} = l \dot\theta \sin\theta \end{gather*} If we define a desired trajectory: $$\bar{y}^d(t) = \frac{l}{2} + \frac{l}{4} \sin(t),$$ then the task-space controller is easily implemented using the derivation above.

The task space derivation above provides a convenient generalization of the partial feedback linearization (PFL) Spong94a, which encompasses both the collocated and non-collocated results. If we choose $\by = \bq_2$ (collocated), then we have $$\bH_1 = 0, \bH_2 = \bI, \dot\bH = 0, \bar\bH = \bI, \bar\bH^+ = \bI.$$ From this, the command in ($\ref{eq:q2cmd}$) reduces to $\ddot\bq_2 = \ddot\bq_2^d$. The actuator command is then $$\bu = \bM_{21} \bM_{11}^{-1} (\btau_1 - \bM_{12} \ddot\bq_2^d) + \bM_{22} \ddot\bq_2^d + \btau_2,$$ and the rank condition ($\ref{eq:rank_condition}$) is always met.

If we choose ${\bf y} = \bq_1$ (non-collocated), we have $$\bH_1 = \bI, \bH_2 = 0, \dot\bH = 0, \bar{\bH}=-\bM_{11}^{-1}\bM_{12}.$$ The rank condition ($\ref{eq:rank_condition}$) requires that $\text{rank}(\bM_{12}) = l$, in which case we can write $\bar{\bH}^+=-\bM_{12}^+\bM_{11}$, reducing the rank condition to precisely the "Strong Inertial Coupling" condition described in Spong94a. Now the command in ($\ref{eq:q2cmd}$) reduces to $$\ddot\bq_2 = \bM_{12}^+ \left [ \btau_1 - \bM_{11}\ddot\bq_1^d \right] \label{eq:q2ddnonco}$$ The actuator command is found by substituting ($\ref{eq:q2ddnonco}$) into ($\ref{eq:active_dyn}$), yielding the same results as in Spong94a.

# Energy shaping

Recall the phase portraits that we used to understand the dynamics of the undamped, unactuated, simple pendulum ($u=b=0$). The orbits of this phase plot were defined by contours of constant energy. One very special orbit, known as a homoclinic orbit, is the orbit which passes through the unstable fixed point. In fact, visual inspection will reveal that any state that lies on this homoclinic orbit must pass into the unstable fixed point. Therefore, if we seek to design a nonlinear feedback control policy which drives the simple pendulum from any initial condition to the unstable fixed point, a very reasonable strategy would be to use actuation to regulate the energy of the pendulum to place it on this homoclinic orbit, then allow the system dynamics to carry us to the unstable fixed point.

This idea turns out to be a bit more general than just for the simple pendulum. As we will see, we can use similar concepts of energy shaping' to produce swing-up controllers for the acrobot and cart-pole systems. It's important to note that it only takes one actuator to change the total energy of a system.

Although a large variety of swing-up controllers have been proposed for these model systems (c.f. Fantoni02, Araki05, Xin04, Spong94, Mahindrakar05, Berkemeier99, Murray91, Lai06), the energy shaping controllers tend to be the most natural to derive and perhaps the most well-known.

# Simple pendulum

Recall the equations of motion for the undamped simple pendulum were given by $$ml^2 \ddot\theta + mgl\sin\theta = u.$$ The total energy of the simple pendulum is given by $$E = \frac{1}{2} m l^2 \dot\theta^2 - mgl\cos\theta.$$ To understand how to control the energy, observe that \begin{align*} \dot{E} =& ml^2\dot\theta \ddot\theta + \dot\theta mgl\sin\theta \\ =& \dot\theta \left[ u - mgl\sin\theta \right] + \dot\theta mgl\sin\theta \\ =& u\dot\theta. \end{align*} In words, adding energy to the system is simple - simply apply torque in the same direction as $\dot\theta$. To remove energy, simply apply torque in the opposite direction (e.g., damping).

To drive the system to the homoclinic orbit, we must regulate the energy of the system to a particular desired energy, $$E^d = mgl.$$ If we define $\tilde{E} = E - E^d$, then we have $$\dot{\tilde{E}} = \dot{E} = u\dot\theta.$$ If we apply a feedback controller of the form $$u = -k \dot\theta \tilde{E},\quad k>0,$$ then the resulting error dynamics are $$\dot{\tilde{E}} = - k \dot\theta^2 \tilde{E}.$$ These error dynamics imply an exponential convergence: $$\tilde{E} \rightarrow 0,$$ except for states where $\dot\theta=0$. The essential property is that when $E > E^d$, we should remove energy from the system (damping) and when $E < E^d$, we should add energy (negative damping). Even if the control actions are bounded, the convergence is easily preserved.

This is a nonlinear controller that will push all system trajectories to the unstable equilibrium. But does it make the unstable equilibrium locally stable? It is asymptotically stable, but not stable i.s.L. Small perturbations may cause the system to drive all of the way around the circle in order to once again return to the unstable equilibrium. For this reason, once trajectories come into the vicinity of our swing-up controller, we prefer to switch to our LQR balancing controller to performance to complete the task.

# Energy Shaping for the Pendulum

Take a minute to play around with the energy shaping + LQR controller for swinging up the pendulum

pendulum/energy_swingup_and_balance.py

Make sure that you take a minute to look at the code which runs during these examples. Note the somewhat arbitrary threshold for switching to the LQR controller. We'll give a much more satisfying answer for that in the chapter on Lyapunov methods.

# Cart-Pole

Having thought about the swing-up problem for the simple pendulum, let's try to apply the same ideas to the cart-pole system. The basic idea, from Chung95, is to use collocated PFL to simplify the dynamics, use energy shaping to regulate the pendulum to its homoclinic orbit, then to add a few terms to make sure that the cart stays near the origin. The collocated PFL (when all parameters are set to 1) left us with: \begin{gather} \ddot{x} = u \\ \ddot\theta = - u c - s %\ddot\theta = -\frac{1}{l}( u c + g s ) \end{gather} The energy of the pendulum (a unit mass, unit length, simple pendulum in unit gravity) is given by: $$E(\bx) = \frac{1}{2} \dot\theta^2 - \cos\theta.$$ The desired energy, equivalent to the energy at the desired fixed-point, is $$E^d = 1.$$ Again defining $\tilde{E}(\bx) = E(\bx) - E^d$, we now observe that \begin{align*} \dot{\tilde{E}}(\bx) =& \dot{E}(\bx) = \dot\theta \ddot\theta + \dot\theta s \\ % \dot\tilde{E} = ml^2 \dot\theta \ddot\theta + mgl s =& \dot\theta [ -uc - s] + \dot\theta s \\ % - ml \dot\theta [ u c + g s ] + mgl s =& - u\dot\theta \cos\theta. % - ml u \dot\theta c \end{align*} Therefore, if we design a controller of the form $$u = k \dot\theta\cos\theta \tilde{E},\quad k>0$$ the result is $$\dot{\tilde{E}} = - k \dot\theta^2 \cos^2\theta \tilde{E}.$$ This guarantees that $| \tilde{E} |$ is non-increasing, but isn't quite enough to guarantee that it will go to zero. For example, if $\theta = \dot\theta = 0$, the system will never move. However, if we have that $$\int_0^t \dot\theta^2(t') \cos^2 \theta(t') dt' \rightarrow \infty,\quad \text{as } t \rightarrow \infty,$$ then we have $\tilde{E}(t) \rightarrow 0$. This condition is satisfied for all but the trivial constant trajectories at fixed points.

Now we return to the full system dynamics (which includes the cart). Chung95 and Spong96 use the simple pendulum energy controller with an addition PD controller designed to regulate the cart: $$\ddot{x}^d = k_E \dot\theta \cos\theta \tilde{E} - k_p x - k_d \dot{x}.$$ Chung95 provides a proof of convergence for this controller with some nominal parameters. Fantoni02 provides a slightly different controller derived directly from a Lyapunov argument.

# Acrobot

Swing-up control for the acrobot can be accomplished in much the same way. Spong94 - pump energy. Clean and simple. No proof. Slightly modified version (uses arctan instead of sat) in Spong95. Clearest presentation in Spong96.

Use collocated PFL. ($\ddot{q}_2 = \ddot{q}_2^d$). $$E(\bx) = \frac{1}{2}\dot\bq^T\bM\dot{\bq} + U(\bx).$$ $$E_d = U(\bx^*).$$ $$\bar{u} = \dot{q}_1 \tilde{E}.$$ $$\ddot{q}_2^d = - k_1 q_2 - k_2 \dot{q}_2 + k_3 \bar{u},$$

Extra PD terms prevent proof of asymptotic convergence to homoclinic orbit. Proof of another energy-based controller in Xin04.

# Discussion

The energy shaping controller for swing-up presented here is a pretty faithful representative from the field of nonlinear underactuated control. Typically these control derivations require some clever tricks for simplifying or canceling out terms in the nonlinear equations, then some clever Lyapunov function to prove stability. In these cases, PFL was used to simplify the equations, and therefore the controller design.

These controllers are important, representative, and relevant. But clever tricks with nonlinear equations seem to be fundamentally limited. Most of the rest of the material presented in this book will emphasize more general computational approaches to formulating and solving these and other control problems.

# Other model systems

The acrobot and cart-pole systems are just two of the model systems used heavily in underactuated control research. Other examples include:

• Pendubot
• Inertia wheel pendulum
• Furuta pendulum (horizontal rotation and vertical pendulum)
• Hovercraft

# Walking and Running

Practical legged locomotion is one of the fundamental unsolved problems in robotics. Many challenges are in mechanical design - a walking robot must carry all of its actuators and power, making it difficult to carry ideal force/torque - controlled actuators. But many of the unsolved problems are because walking robots are underactuated control systems.

In this chapter we'll introduce some of the simple models of walking and robots, the control problems that result, and a very brief summary of some of the control solutions described in the literature. Compared to the robots that we have studied so far, our investigations of legged locomotion will require additional tools for thinking about limit cycle dynamics and dealing with impacts.

# The Ballistic Walker

One of the earliest models of walking was proposed by McMahonMcMahon80, who argued that humans use a mostly ballistic (passive) gait. COM trajectory looks like a pendulum (roughly walking by vaulting). EMG activity in stance legs is high, but EMG in swing leg is very low, except for very beginning and end of swing. Proposed a three-link "ballistic walker" model, which models a single swing phase (but not transitions to the next swing nor stability). Interestingly, in recent years the field has developed a considerably deeper appreciation for the role of compliance during walking; simple walking-by-vaulting models are starting to fall out of favor.

McGeerMcGeer90 followed up with a series of walking machines, called "passive dynamic walkers". The walking machine by Collins and RuinaCollins01 is the most impressive passive walker to date.

# The Rimless Wheel

The most elementary model of passive dynamic walking, first used in the context of walking by McGeer90, is the rimless wheel. This simplified system has rigid legs and only a point mass at the hip as illustrated in the figure above. To further simplify the analysis, we make the following modeling assumptions:

• Collisions with ground are inelastic and impulsive (only angular momentum is conserved around the point of collision).
• The stance foot acts as a pin joint and does not slip.
• The transfer of support at the time of contact is instantaneous (no double support phase).
• $0 \le \gamma < \frac{\pi}{2}$, $0 < \alpha < \frac{\pi}{2}$, $l > 0$.

Note that the coordinate system used here is slightly different than for the simple pendulum ($\theta=0$ is at the top, and the sign of $\theta$ has changed).

update the coordinates (here and in drake)

The most comprehensive analysis of the rimless wheel was done by Coleman98a.

# Stance Dynamics

The dynamics of the system when one leg is on the ground are given by $$\ddot\theta = \frac{g}{l}\sin(\theta).$$ If we assume that the system is started in a configuration directly after a transfer of support ($\theta(0^+) = \gamma-\alpha$), then forward walking occurs when the system has an initial velocity, $\dot\theta(0^+) > \omega_1$, where $$\omega_1 = \sqrt{ 2 \frac{g}{l} \left[ 1 - \cos \left (\gamma-\alpha \right) \right]}.$$ $\omega_1$ is the threshold at which the system has enough kinetic energy to vault the mass over the stance leg and take a step. This threshold is zero for $\gamma = \alpha$ and does not exist for $\gamma > \alpha$. The next foot touches down when $\theta(t) = \gamma+\alpha$, at which point the conversion of potential energy into kinetic energy yields the velocity $$\dot\theta(t^-) = \sqrt{ \dot\theta^2(0^+) + 4\frac{g}{l} \sin\alpha \sin\gamma }.$$ $t^-$ denotes the time immediately before the collision.

# Foot Collision

The angular momentum around the point of collision at time $t$ just before the next foot collides with the ground is $$L(t^-) = -ml^2\dot\theta(t^-) \cos(2\alpha).$$ The angular momentum at the same point immediately after the collision is $$L(t^+) = -ml^2\dot\theta(t^+).$$ Assuming angular momentum is conserved, this collision causes an instantaneous loss of velocity: $$\dot\theta(t^+) = \dot\theta(t^-) \cos(2\alpha).$$

# Forward simulation

Given the stance dynamics, the collision detection logic ($\theta = \gamma \pm \alpha$), and the collision update, we can numerically simulate this simple model. Doing so reveals something remarkable... the wheel starts rolling, but then one of two things happens: it either runs out of energy and stands still, or it quickly falls into a stable periodic solution. Convergence to this stable periodic solution appears to happen from a huge range of initial conditions. Try it yourself.

# Numerical simulation of the rimless wheel

rimless_wheel/simulate.py

Optionally you can specify the initial conditions. Here are a few good values to try:

rimless_wheel/simulate.py rimless_wheel/simulate.py rimless_wheel/simulate.py rimless_wheel/simulate.py rimless_wheel/simulate.py

One of the fantastic things about the rimless wheel is that the dynamics are exactly the undamped simple pendulum that we've thought so much about. So you will recognize the phase portraits of the system below - they are centered around the unstable fixed point of the simple pendulum. Phase plots of the trajectories from the various initial conditions recommended above are plotted below.

Take a minute to make sure you can follow these trajectories through state space for each of the initial conditions. It's non-trivial, but worth the effort.

# Limit Cycles

A limit cycle is an asymptotically stable or unstable periodic orbit†† Marginally-stable orbits, such as the closed-orbits of the undamped simple pendulum, are typically not called limit cycles.. One of the simplest models of limit cycle behavior is the Van der Pol oscillator. Let's examine that first...

# Van der Pol Oscillator

$$\ddot{q} + \mu (q^2 - 1)\dot{q} + q = 0, \quad \mu>0$$ One can think of this system as almost a simple spring-mass-damper system, except that it has nonlinear damping. In particular, the velocity term dissipates energy when $|q|>1$, and adds energy when $|q|<1$. Therefore, it is not terribly surprising to see that the system settles into a stable oscillation from almost any initial conditions (the exception is the state $q=0,\dot{q}=0$). This can be seen nicely in the phase portrait below.

However, as you can see from the plot of system trajectories in the time domain, then a slightly different picture emerges. Although the phase portrait clearly reveals that all trajectories converge to the same orbit, the time domain plot reveals that these trajectories do not necessarily synchronize in time.

The Van der Pol oscillator clearly demonstrates what we would think of as a stable limit cycle, but also exposes the subtlety in defining this limit cycle stability. Neighboring trajectories do not necessarily converge on a stable limit cycle. In contrast, defining the stability of a particular trajectory (parameterized by time) is relatively easy.

Let's make a few quick points about the existence of closed-orbits. If we can define a closed region of phase space which does not contain any fixed points, then it must contain a closed-orbitStrogatz94. By closed, I mean that any trajectory which enters the region will stay in the region (this is the Poincaré-Bendixson Theorem). It's also interesting to note that gradient potential fields (e.g. Lyapunov functions) cannot have a closed-orbitStrogatz94, and consequently Lyapunov analysis cannot be applied to limit cycle stability without some modification.

# Poincaré Maps

One definition for the stability of a limit cycle uses the method of Poincaré. Let's consider an $n$ dimensional dynamical system, $\dot{\bx} = {\bf f}(\bx).$ Define an $n-1$ dimensional surface of section, $S$. We will also require that $S$ is transverse to the flow (i.e., all trajectories starting on $S$ flow through $S$, not parallel to it). The Poincaré map (or return map) is a mapping from $S$ to itself: $$\bx_p[n+1] = {\bf P}(\bx_p[n]),$$ where $\bx_p[n]$ is the state of the system at the $n$th crossing of the surface of section. Note that we will use the notation $\bx_p$ to distinguish the state of the discrete-time system from the continuous time system; they are related by $\bx_p[n] = \bx(t_c[n])$, where $t_c[n]$ is the time of the $n$th crossing of $S$.

# Return map for the Van der Pol Oscillator

Since the full system is two dimensional, the return map dynamics are one dimensional. One dimensional maps, like one dimensional flows, are amenable to graphical analysis. To define a Poincaré section for the Van der Pol oscillator, let $S$ be the line segment where $q = 0$, $\dot{q} > 0$.

If ${\bf P}(\bx_p)$ exists for all $\bx_p \in S$, then this method turns the stability analysis for a limit cycle into the stability analysis of a fixed point on a discrete map. In practice it is often difficult or impossible to find ${\bf P}$ analytically, but it can be obtained quite reasonably numerically. Once ${\bf P}$ is obtained, we can infer local limit cycle stability with an eigenvalue analysis. There will always be a single eigenvalue of 0 - corresponding to perturbations along the limit cycle which do not change the state of first return. The limit cycle is considered locally exponentially stable if all remaining eigenvalues, $\lambda_i$, have magnitude less than one, $|\lambda_i|<1$.

In fact, it is often possible to infer more global stability properties of the return map by examining ${\bf P}$. Koditschek91 describes some of the stability properties known for unimodal maps.

A particularly graphical method for understanding the dynamics of a one-dimensional iterated map is with the staircase method. Sketch the Poincaré map and also the line of slope one. Fixed points are the crossings with the unity line. Asymptotically stable if $|\lambda| < 1$. Unlike one dimensional flows, one dimensional maps can have oscillations (happens whenever $\lambda < 0$).

[insert staircase diagram of van der Pol oscillator return map here]

# Poincaré Map

We can now derive the angular velocity at the beginning of each stance phase as a function of the angular velocity of the previous stance phase. First, we will handle the case where $\gamma \le \alpha$ and $\dot\theta_n^+ > \omega_1$. The "step-to-step return map", factoring losses from a single collision, the resulting map is: $$\dot\theta^{+}_{n+1} = \cos(2\alpha) \sqrt{ ({\dot\theta_n}^{+})^{2} + 4\frac{g}{l}\sin\alpha \sin\gamma}.$$ where the $\dot{\theta}^{+}$ indicates the velocity just after the energy loss at impact has occurred.

Using the same analysis for the remaining cases, we can complete the return map. The threshold for taking a step in the opposite direction is $$\omega_2 = - \sqrt{2 \frac{g}{l} [1 - \cos(\alpha + \gamma)]}.$$ For $\omega_2 < \dot\theta_n^{+} < \omega_1,$ we have $$\dot\theta_{n+1}^{+} = -\dot\theta_n^{+} \cos(2\alpha).$$ Finally, for $\dot\theta_n^{+} < \omega_2$, we have $$\dot\theta_{n+1}^{+} = - \cos(2\alpha)\sqrt{(\dot\theta_n^{+})^2 - 4 \frac{g}{l} \sin\alpha \sin\gamma}.$$ Altogether, we have (for $\gamma \le \alpha$) $$\dot\theta_{n+1}^{+} = \begin{cases} \cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 + 4 \frac{g}{l} \sin\alpha \sin\gamma}, & \text{ } \omega_1 < \dot\theta_n^{+} \\ -\dot\theta_n^{+} \cos(2\alpha), & \text{ } \omega_2 < \dot\theta_n^{+} < \omega_1 \\ -\cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 - 4\frac{g}{l} \sin\alpha \sin\gamma}, & \dot\theta_n^{+} < w_2 \end{cases}.$$

Notice that the return map is undefined for $\dot\theta_n = \{ \omega_1, \omega_2 \}$, because from these configurations, the wheel will end up in the (unstable) equilibrium point where $\theta = 0$ and $\dot\theta = 0$, and will therefore never return to the map.

This return map blends smoothly into the case where $\gamma > \alpha$. In this regime, $$\dot\theta_{n+1}^{+} = \begin{cases} \cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 + 4 \frac{g}{l} \sin\alpha \sin\gamma}, & \text{ } 0 \le \dot\theta_n^{+} \\ -\dot\theta_n^{+} \cos(2\alpha), & \text{ } \omega_2 < \dot\theta_n^{+} < 0 \\ -\cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 - 4\frac{g}{l} \sin\alpha \sin\gamma}, & \dot\theta_n^{+} \le w_2 \end{cases}.$$ Notice that the formerly undefined points at $\{\omega_1,\omega_2\}$ are now well-defined transitions with $\omega_1 = 0$, because it is kinematically impossible to have the wheel statically balancing on a single leg.

# Fixed Points and Stability

For a fixed point, we require that $\dot\theta_{n+1}^{+} = \dot\theta_n^{+} = \omega^*$. Our system has two possible fixed points, depending on the parameters: $$\omega_{stand}^* = 0,~~~ \omega_{roll}^* = \cot(2 \alpha) \sqrt{4 \frac{g}{l} \sin\alpha\sin\gamma}.$$ The limit cycle plotted above illustrates a state-space trajectory in the vicinity of the rolling fixed point. $\omega_{stand}^*$ is a fixed point whenever $\gamma < \alpha$. $\omega_{roll}^*$ is a fixed point whenever $\omega_{roll}^* > \omega_1$. It is interesting to view these bifurcations in terms of $\gamma$. For small $\gamma$, $\omega_{stand}$ is the only fixed point, because energy lost from collisions with the ground is not compensated for by gravity. As we increase $\gamma$, we obtain a stable rolling solution, where the collisions with the ground exactly balance the conversion of gravitational potential to kinetic energy. As we increase $\gamma$ further to $\gamma > \alpha$, it becomes impossible for the center of mass of the wheel to be inside the support polygon, making standing an unstable configuration.

# The Compass Gait

The rimless wheel models only the dynamics of the stance leg, and simply assumes that there will always be a swing leg in position at the time of collision. To remove this assumption, we take away all but two of the spokes, and place a pin joint at the hip. To model the dynamics of swing, we add point masses to each of the legs. For actuation, we first consider the case where there is a torque source at the hip - resulting in swing dynamics equivalent to an Acrobot (although in a different coordinate frame).

In addition to the modeling assumptions used for the rimless wheel, we also assume that the swing leg retracts in order to clear the ground without disturbing the position of the mass of that leg. This model, known as the compass gait, is well studied in the literature using numerical methods Goswami96a+Goswami99+Spong03, but relatively little is known about it analytically.

The state of this robot can be described by 4 variables: $\theta_{st},\theta_{sw},\dot\theta_{st}$, and $\dot\theta_{sw}$. The abbreviation $st$ is shorthand for the stance leg and $sw$ for the swing leg. Using $\bq = [ \theta_{st}, \theta_{sw} ]^T$ and $\bu = \tau$, we can write the dynamics as $$\bM(\bq)\ddot\bq + \bC(\bq,\dot\bq)\dot\bq = \btau_g(q) + \bB\bu,$$ with \begin{gather*} \bM = \begin{bmatrix} (m_h+m)l^2 + ma^2 & -mlb\cos(\theta_{st}-\theta_{sw}) \\ -mlb\cos(\theta_{st}-\theta_{sw}) & mb^2 \end{bmatrix}\\ \bC = \begin{bmatrix} 0 & -mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{sw} \\ mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{st} & 0 \end{bmatrix} \\ \btau_g(q) = \begin{bmatrix} (m_hl + ma + ml)g\sin(\theta_{st}) \\ -mbg\sin(\theta_{sw}) \end{bmatrix},\\ \bB = \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{gather*} and $l=a+b$.

The foot collision is an instantaneous change of velocity caused by a impulsive force at the foot that brings the foot to rest. The update to the velocities can be calculated using the derivation for impulsive collisions derived in the appendix. To use it, we proceed with the following steps:

• Add a "floating base" to the system by adding a free (x,y) joint at the pre-impact stance toe, $\bq_{fb} = [x,y,\theta_{st},\theta_{sw}]^T.$
• Calculate the mass matrix for this new system, call it $\bM_{fb}$.
• Write the position of the (pre-impact) swing toe as a function $\bphi(\bq_{fb})$. Define the Jacobian of this function: $\bJ = \frac{\partial \bphi}{\partial \bq_{fb}}.$
• The post-impact velocity that ensures that $\dot\bphi = 0$ after the collision is given by $$\dot\bq^+ = \left[ \bI - \bM_{fb}^{-1}\bJ^T[\bJ\bM_{fb}^{-1}\bJ^T]^{-1}\bJ\right] \dot\bq^-,$$ noting that $\dot{x}^- = \dot{y}^- = 0$, and that you need only read out the last two elements of $\dot{\bq}^+$. The velocity of the post-impact stance foot will be zero by definition, and the new velocity of the pre-impact stance foot can be completely determined from the minimal coordinates.
• Switch the stance and swing leg positions and velocities.

# Numerical simulation of the compass gait

You can simulate the passive compass gait using: compass_gait/simulate.py

Numerical integration of these equations reveals a stable limit cycle, plotted below. Notice that when the left leg is in stance (top part of the plot), the trajectory quite resembles the familiar pendulum dynamics of the rimless wheel. But this time, when the leg impacts, it takes a long arc around as the swing leg before returning to stance. The impacts are also clearly visible as discontinuous changes (decreases) in velocity. The dependence of this limit cycle on the system parameters has been studied extensively in Goswami96a.

The basin of attraction of the stable limit cycle is a narrow band of states surrounding the steady state trajectory. Although the simplicity of this model makes it analytically attractive, this lack of stability makes it difficult to implement on a physical device.

# The Kneed Walker

To achieve a more anthropomorphic gait, as well as to acquire better foot clearance and ability to walk on rough terrain, we want to model a walker that includes kneeHsu07. For this, we model each leg as two links with a point mass each.

At the beginning of each step, the system is modeled as a three-link pendulum, like the ballistic walkerMcMahon80+Mochon80+Spong03. The stance leg is the one in front, and it is the first link of a pendulum, with two point masses. The swing leg has two links, with the joint between them unconstrained until knee-strike. Given appropriate mass distributions and initial conditions, the swing leg bends the knee and swings forward. When the swing leg straightens out (the lower and upper length are aligned), knee-strike occurs. The knee-strike is modeled as a discrete inelastic collision, conserving angular momentum and changing velocities instantaneously.

After this collision, the knee is locked and we switch to the compass gait model with a different mass distribution. In other words, the system becomes a two-link pendulum. Again, the heel-strike is modeled as an inelastic collision. After the collision, the legs switch instantaneously. After heel-strike then, we switch back to the ballistic walker's three-link pendulum dynamics. This describes a full step cycle of the kneed walker, which is shown above.

By switching between the dynamics of the continuous three-link and two-link pendulums with the two discrete collision events, we characterize a full point-feed kneed walker walking cycle. After finding appropriate parameters for masses and link lengths, a stable gait is found. As with the compass gait's limit cycle, there is a swing phase (top) and a stance phase (bottom). In addition to the two heel-strikes, there are two more instantaneous velocity changes from the knee-strikes as marked in the figure. This limit cycle is traversed clockwise.

# Highly-articulated Legged Robots

The passive dynamic walkers and hopping robots described in the last chapter capture the fundamental dynamics of legged locomotion -- dynamics which are fundamentally nonlinear and punctuated by impulsive events due to making and breaking contact with the environment. But if you start reading the literature on humanoid robots, or many-legged robots like quadrupeds, then you will find a quite different set of ideas taking center stage: ideas like the "zero-moment point" and footstep planning. The goal of this chapter is to penetrate that world.

I'd like to start the discussion with a model that might seem quite far from the world of legged robots, but I think it's a very useful way to think about the problem.

# A hovercraft model

flip the directions of the force vector in the image

Imagine you have a flying vehicle modeled as a single rigid body in a gravity field with some number of force "thrusters" attached. We'll describe the configuration of the vehicle by its orientation, $\theta$ and the location of its center of mass $(x,z)$. The vector from the center of mass to thruster $i$ is given by $r_i$, yielding the equations of motion: \begin{align*} m\ddot{x} =& \sum_i F_{i,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg,\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y, \end{align*} where I've used $F_{i,x}$ to represent the component of the force in the $x$ direction and have taken just the $y$-axis component of the cross product on the last line.

Our goal is to move the hovercraft to a desired position/orientation and to keep it there. If we take the inputs to be $F_i$, then the dynamics are affine (linear plus a constant term). As such, we can stabilize a stabilizable fixed point using a change of coordinates plus LQR or even plan/track a desired trajectory using time-varying LQR. If I were to add additional linear constraints, for instance constraining $F_{min} \le F_i \le F_{max}$, then I can still use linear model-predictive control (MPC) to plan and stabilize a desired motion. By most accounts, this is a relatively easy control problem. (Note that it would be considerably harder if my control input or input constraints depend on the orientation, $\theta$, but we don't need that here yet).

Now imagine that you just a small number of thrusters (let's say two) each with severe input constraints. To make things more interesting, let us say that you're allowed to move the thrusters, so $\dot{r}_i$ becomes an additional control input, but with some important limitations: you have to turn the thrusters off when they are moving (e.g. $|F_i||\dot{r}_i| = 0$) and there is a limit to how quickly you can move the thrusters $|\dot{r}|_i \le v_{max}$. This problem is now a lot more difficult, due especially to the fact that constraints of the form $u_1 u_2 = 0$ are non-convex.

I find this a very useful thought exercise; somehow our controller needs to make an effectively discrete decision to turn off a thruster and move it. The framework of optimal control should support this - you are sacrificing a short-term loss of control authority for the long-term benefit of having the thruster positioned where you would like, but we haven't developed tools yet that deal well with this discrete plus continuous decision making. We'll need to address that here.

Unfortunately, although the problem is already difficult, we need to continue to add constraints. Now let's say additionally, that the thrusters can only be turned on in certain locations, as cartooned here:

The union of these regions need not form a convex set. Furthermore, these locations could be specified in the robot frame, but may also be specified in the world frame, which I'll call ${\bf p}_i$: $${\bf p}_i = {\bf r}_i - \begin{bmatrix} x \\ 0 \\ z \end{bmatrix}.$$ This problem still feels like it should be tractable, but it's definitely getting hard.

# Robots with (massless) legs

In my view, the hovercraft problem above is a central component of the walking problem. If we consider a walking robot with massless legs, then the feet are exactly like movable thrusters. As above, they are highly constrained - they can only produce force when they are touching the ground, and (typically) they can only produce forces in certain directions, for instance as described by a "friction cone" (you can push on the ground but not pull on the ground, and with Coulomb friction the forces tangential to the ground must be smaller than the forces normal to the ground, as described by a coefficient of friction, e.g. $|F_{\parallel}| < \mu |F_{\perp}|$).

The constraints on where you can put your feet / thrusters will depend on the kinematics of your leg, and the speed at which you can move them will depend on the full dynamics of the leg -- these are difficult constraints to deal with. But the actual dynamics of the rigid body are actually still affine, and still simple!

# Capturing the full robot dynamics

We don't actually need to have massless legs for this discussion to make sense. If we use the coordinates $x,z$ to describe the location of the center of mass (CoM) of the entire robot, and $m$ to represent the entire mass of the robot, then the first two equations remain unchanged. The center of mass is a configuration dependent point, and does not have an orientation, but one important generalization of the orientation dynamics is given by the centroidal momentum matrix, $A(\bq)$, where $A(\bq)\dot{\bq}$ captures the linear and angular momentum of the robot around the center of mass Orin13. Note that the center of mass dynamics are still affine -- even for a big complicated humanoid -- but the centroidal momentum dynamics are nonlinear.

# Impact dynamics

In the previous chapter we devoted relatively a lot of attention to dynamics of impact, characterized for instance by a guard that resets dynamics in a hybrid dynamical system. In those models we used impulsive ground-reaction forces (these forces are instantaneously infinite, but doing finite work) in order to explain the discontinuous change in velocity required to avoid penetration with the ground. This story can be extended naturally to the dynamics of the center of mass.

For an articulated robot, though, there are a number of possible strategies for the legs which can effect the dynamics of the center of mass. For example, if the robot hits the ground with a stiff leg like the rimless wheel, then the angular momentum about the point of collision will be conserved, but any momentum going into the ground will be lost. However, if the robot has a spring leg and a massless toe like the SLIP model, then no energy need be lost.

One strategy that many highly-articulated legged robots use is to keep their center of mass at a constant height, $$z = c \quad \Rightarrow \quad \dot{z} = \ddot{z} = 0,$$ and minimize their angular momentum about the center of mass (here $\ddot\theta=0$). Using this strategy, if the swinging foot lands roughly below the center of mass, then even with a stiff leg there is no energy dissipated in the collision - all of the momentum is conserved. This often (but does not always) justify ignoring the impacts in the center of mass dynamics of the system.

# The special case of flat terrain

While not the only important case, it is extremely common for our robots to be walking over flat, or nearly flat terrain. In this situation, even if the robot is touching the ground in a number of places (e.g., two heels and two toes), the constraints on the center of mass dynamics can be summarized very efficiently.

First, on flat terrain $F_{i,z}$ represents the force that is normal to the surface at contact point $i$. If we assume that the robot can only push on the ground (and not pull), then this implies $$\forall i, F_{i,z} \ge 0 \Rightarrow \sum_i F_{i,z} \ge 0 \Rightarrow \ddot{z} \ge -g.$$ In other words, if I cannot pull on the ground, then my center of mass cannot accelerate towards the ground faster than gravity.

Furthermore, if we use a Coulomb model of friction on the ground, with friction coefficient $\mu$, then $$\forall i, |F_{i,x}| \le \mu F_{i,z} \Rightarrow \sum_i |F_{i,x}| \le \mu \sum_i F_z \Rightarrow |\ddot{x}| \le \mu (\ddot{z} + g).$$ For instance, if I keep my center of mass at a constant height, then $\ddot{z}=0$ and $|\ddot{x}| \le \mu g$; this is a nice reminder of just how important friction is if you want to be able to move around in the world.

Even better, let us define the "center of pressure" (CoP) as the point on the ground where $$x_{cop} = \frac{\sum_i p_{i,x} F_{i,z}}{\sum_i F_{i,z}},$$ and since all $p_{i,z}$ are equal on flat terrain, $z_{cop}$ is just the height of the terrain. It turns out that the center of pressure is a "zero-moment point" (ZMP) -- a property that we will demonstrate below -- and moment-balance equation gives us a very important relationship between the location of the CoP and the dynamics of the CoM: $(m\ddot{z} + mg) (x_{cop} - x) = (z_{cop} - z) m\ddot{x} - I\ddot\theta.$ If we use the strategy proposed above for ignoring collision dynamics, $\ddot{z} = \ddot{\theta} = 0$, then we have $z_{cop} - z$ is a constant height $h$, and the result is the famous "ZMP equations": $\ddot{x} = -\frac{g}{h} (x_{cop}-x).$ So the location of the center of pressure completely determines the acceleration of the center of mass, and vice versa! What's more, this relationship is affine -- a property that we can exploit in a number of ways.

As an example, we can easily relate constraints on the CoP to constraints on $\ddot{x}$. It is easy to verify from the definition that the CoP must live inside the convex hull of the ground contact points. Therefore, if we use the $\ddot{z}=\ddot\theta=0$ strategy, then this directly implies strict bounds on the possible accelerations of the center of mass given the locations of the ground contacts.

# An aside: the zero-moment point derivation

The zero-moment point (ZMP) is discussed very frequently in the current literature on legged robots. It also has an unfortunate tendency to be surrounded by some confusion; a number of people have defined the ZMP is slightly different ways (see e.g. Goswami99 for a summary). Therefore, it makes sense to provide a simple derivation here.

First let us recall that for rigid body systems I can always summarize the contributions from many external forces as a single wrench (force and torque) on the object -- this is simply because the $F_i$ terms enter our equations of motion linearly. Specifically, given any point in space, $r$, in coordinates relative to $(x,z)$:

update diagram to have r instead of p

I can re-write the equations of motion as \begin{align*} m\ddot{x} =& \sum_i F_{i,x} = F_{net,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg = F_{net,z} - mg,\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y = ({\bf r} \times {\bf F}_{net})_y + \tau_{net}, \end{align*} where ${\bf F}_{net} = \sum_i {\bf F}_i$ and the value of $\tau_{net}$ depends on the location ${\bf r}$. For some choices of ${\bf r}$, we can make $\tau_{net}=0$. Examining $({\bf r} \times {\bf F}_{net})_y = r_z F_{net,x} - r_x F_{net,z} = \left[ r_i \times F_i \right]_y,$ we can see that when $F_{net,z} > 0$ there is an entire line of solutions, $r_x = a r_z + b$, including one that will intercept the terrain. For walking robots, it is this point on the ground from which the external wrench can be described by a single force vector (and no moment) that is the famous "zero-moment point" (ZMP). Substituting the back in to replace $F_{net}$, we can see that $r_x = \frac{r_z m \ddot{x} - I \ddot\theta}{m\ddot{z} + mg}.$ If we assume that $\ddot{z}=\ddot{\theta}=0$ and replace the relative coordinates with the global coordinates ${\bf r} = {\bf p} - [x,0,z]^T$, then we arrive at precisely the equation presented above.

Furthermore, since $\left[ r_i \times F_i \right]_y = \sum_i \left( r_{i,z} F_{i,x} - r_{i,x} F_{i,z} \right),$ and for flat terrain we have $r_z F_{net,x} = \sum_i r_{i,z} F_{i,x},$ then we can see that this ZMP is exactly the CoP: $r_x = \frac{\sum_i r_{i,x} F_{i,z}}{ F_{net,z} }.$

In three dimensions, we solve for the point on the ground where $\tau_{net,y} = \tau_{net,x} = 0$, but allow $\tau_{net,z} \ne 0$ to extract the analogous equations in the $y$-axis: $r_y = \frac{r_z m \ddot{y} - I \ddot\theta}{m\ddot{z} + mg}.$

# Whole-Body Control

Coming soon. For a description of our approach with Atlas, see Kuindersma13+Kuindersma14.

# Footstep planning and push recovery

Coming soon. For a description of our approach with Atlas, see Deits14a+Kuindersma14. For nice geometric insights on push recovery, see Koolen12.

# Beyond ZMP planning

Coming soon. For a description of our approach with Atlas, see Dai14+Kuindersma14.

# Manipulation

Sense, plan, act. Grasp planning (grasp metrics). Underactuated hands. Feedback. Not: perception (only mention its importance).

# Model Systems with Stochasticity

My goals for this chapter are to build intuition for the beautiful and rich behavior of nonlinear dynamical system that are subjected to random (noise/disturbance) inputs. So far we have focused primarily on systems described by $\dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]).$ In this chapter, I would like to broaden the scope to think about $\dot{\bx}(t) = f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n],\bw[n]),$ where this additional input $\bw$ is the (vector) output of some random process. In other words, we can begin thinking about stochastic systems by simply understanding the dynamics of our existing ODEs subjected to an additional random input.

This form is extremely general as written. $\bw(t)$ can represent time-varying random disturbances (e.g. gusts of wind), or even constant model errors/uncertainty. One thing that we are not adding, yet, is measurement uncertainty. There is a great deal of work on observability and state estimation that study the question of how you can infer the true state of the system given noise sensor readings. For this chapter we are assuming perfect measurements of the full state, and are focused instead on the way that "process noise" shapes the long-term dynamics of the system.

I will also stick primarily to discrete time dynamics for this chapter, simply because it is easier to think about the output of a discrete-time random process, $\bw[n]$, than a $\bw(t)$. But you should know that all of the ideas work in continuous time, too. Also, most of our examples will take the form of additive noise: $\bx[n+1] = f(\bx[n],\bu[n]) + \bw[n],$ which is a particular useful and common specialization of our general form. And this form doesn't give up much -- the disturbance on step $k$ can pass through the nonlinear function $f$ on step $k+1$ giving rich results -- but is often much easier to work with.

# The Master Equation

Let's start by looking at some simple examples.

# A Bistable System + Noise

Let's consider (a time-reversed version of) one of my favorite one-dimensional systems: $\dot{x} = x - x^3.$ add plot This deterministic system has stable fixed points at $x^* = \{-1,1\}$ and an unstable fixed point at $x^* = 0$.

A reasonable discrete-time approximation of these dynamics, now with additive noise, is $x[n+1] = x[n] + h (x[n] - x[n]^3 + w[n]).$ When $w[n]=0$, this system has the same fixed points and stability properties as the continuous time system. But let's examine the system when $w[n]$ is instead the result of a zero-mean Gaussian white noise process, defined by: \begin{gather*} \forall n, E\left[ w[n] \right] = 0,\\ E\left[ w[i]w[j] \right] = \begin{cases} \sigma^2, & \text{ if } i=j,\\ 0, & \text{ otherwise.} \end{cases} \end{gather*} Here $\sigma$ is the standard deviation of the Gaussian.

When you simulate this system for small values of $\sigma$, you will see trajectories move roughly towards one of the two fixed points (for the deterministic system), but every step is modified by the noise. In fact, even if the trajectory were to arrive exactly at what was once a fixed point, it is almost surely going to move again on the very next step. In fact, if we plot many runs of the simulation from different initial conditions all on the same plot, we will see something like the figure below.

During any individual simulation, the state jumps around randomly for all time, and could even transition from the vicinity of one fixed point to the other fixed point. Another way to visualize this output is to animate a histogram of the particles over time.

You can run this demo for yourself:

simple/stochastic_system_particle_demo.py

Let's take a moment to appreciate the implications of what we've just observed. Every time that we've analyzed a system to date, we've asked questions like "given x[0], what is the long-term behavior of the system, $\lim_{n\rightarrow\infty} x[n]$?", but now $x[n]$ is a random variable. The trajectories of this system do not converge, and the system does not exhibit any form of stability that we've introduced so far.

All is not lost. If you watch the animation closely, you might notice the distribution of this random variable is actually very well behaved. This is the key idea for this chapter.

Let us use $p_n(x)$ to denote the probability density function over the random variable $x$ at time $n$. Note that this density function must satisfy $\int_{-\infty}^\infty p_n(x) dx = 1.$ It is actually possible to write the dynamics of the probability density with the simple relation $p_{n+1}(x) = \int_{-\infty}^\infty p(x|x') p_n(x') dx',$ where $p(x|x')$ encodes the stochastic dynamics as a conditional distribution of the next state (here $x$) as a function of the current state (here $x'$). Dynamical systems that can be encoded in this way are known as continuous-state Markov Processes, and the governing equation above is often referred to as the "master equation" for the stochastic process. In fact this update is even linear(!) ; a fact that can enable closed-form solutions to some impressive long-term statistics, like mean time to failure or first passage timesByl08a. Unfortunately, it is often difficult to perform the integral in closed form, so we must often resort to discretation or numerical analysis.

The slightly more general form of the master equation, which works for multivariable distributions with state-domain ${\cal X}$, and systems with control inputs $\bu$, is $p_{n+1}(\bx) = \int_{\cal X} p(\bx|\bx',\bu) p_n(\bx') d\bx'.$ This is a (continuous-state) Markov Decision Process.

# Stationary Distributions

In the example above, the histogram is our numerical approximation of the probability density. If you simulate it a few times, you will probably believe that, although the individual trajectories of the system do not converge, the probability distribution actually does converge to what's known as a stationary distribution -- a fixed point of the master equation. Instead of thinking about the dynamics of the trajectories, we need to start thinking about the dynamics of the distribution.

# The Linear Gaussian System

Let's consider the one-dimensional linear system with additive noise: $x[n+1] = a x[n] + w[n].$ When $w[n]=0$, the system is stable (to the origin) for $-1 < a < 1$. Let's make sure that we can understand the dynamics of the master equation for the case when $w[n]$ is Gaussian white noise with standard deviation $\sigma$.

First, recall that the probability density function of a Gaussian with mean $\mu$ is given by $p(w) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(w-\mu)^2}{2\sigma^2}}.$ When conditioned on $x[n]$, the distribution given by the dynamics subjected to mean-zero Gaussian white noise is simply another Gaussian, with the mean given by $ax[n]$: $p(x[n+1]|x[n]) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x[n+1]-ax[n])^2}{2\sigma^2}},$ yielding the master equation $p_{n+1}(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} p_n(x') dx'.$

Now here's the magic. Let's push a distribution, $p_n(x)$, which is zero-mean, with standard deviation $\sigma_n$, through the master equation: \begin{align*} p_{n+1}(x) =& \frac{1}{\sqrt{2 \pi \sigma^2}}\frac{1}{\sqrt{2 \pi \sigma_n^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} e^{-\frac{(x')^2}{2\sigma_n^2}} dx',\\ =& \frac{1}{\sqrt{2 \pi (\sigma^2+a^2 \sigma_n^2)}} e^{-\frac{x^2}{2(\sigma^2 + a^2 \sigma_n^2)}}. \end{align*} The result is another mean-zero Gaussian with $\sigma_{n+1}^2 = \sigma^2 + a^2 \sigma_n^2$. This result generalizes to the multi-variate case, too, and might be familiar to you e.g. from the process update of the Kalman filter.

Taking it a step further, we can see that a stationary distribution for this system is given by a mean-zero Gaussian with $\sigma_*^2 = \frac{\sigma^2}{1-a^2}.$ Note that this distribution is well defined when $-1 < a < 1$ (only when the system is stable).

The stationary distribution of the linear Gaussian system reveals the fundamental and general balance between two terms in the governing equations of any stochastic dynamical system: the stability of the deterministic system is bringing trajectories together (smaller $a$ means faster convergence of the deterministic system and results in a more narrow distribution), but the noise in the system is forcing trajectories apart (larger $\sigma$ means larger noise and results in a wider distribution).

Given how rich the dynamics can be for deterministic nonlinear systems, you can probably imagine that the possible long-term dynamics of the probability are also extremely rich. If we simply flip the signs in the dynamics we examined above, we'll get our next example:

# The Cubic Example + Noise

Now let's consider the discrete-time approximation of $\dot{x} = -x + x^3,$ again with additive noise: $x[n+1] = x[n] + h (-x[n] + x[n]^3 + w[n]).$ With $w[n]=0$, the system has only a single stable fixed point (at the origin), whose deterministic region of attraction is given by $x \in (-1,1)$. If we again simulate the system from a set of random initial conditions and plot the histogram, we will see something like the figure below.

Be sure to watch the animation. Or better yet, run the simulation for yourself by changing the sign of the derivative in the example and re-running:

simple/stochastic_system_particle_demo.py

Now try increasing the noise (e.g. pre-multiply the noise input, $w$, in the dynamics by a scalar like 2).

What is the stationary distribution for this system? In fact, there isn't one. Although we initially collect probability density around the stable fixed point, you should notice a slow leak -- on every step there is some probability of transitioning past unstable fixed points and getting driven by the unstable dynamics off towards infinity. If we run the simulation long enough, there won't be any probability density left at $x=0$.

# The Stochastic Van der Pol Oscillator

One more example; this is a fun one. Let's think about one of the simplest examples that we had for a system that demonstrates limit cycle stability -- the Van der Pol oscillator -- but now we'll add Gaussian white noise, $$\ddot{q} + (q^2 - 1)\dot{q} + q = w(t),$$ Here's the question: if we start with a small set of initial conditions centered around one point on the limit cycle, then what is the long-term behavior of this distribution?

Since the long-term behavior of the deterministic system is periodic, it would be very logical to think that the state distribution for this stochastic system would fall into a stable periodic solution, too. But think about it a little more, and then watch the animation (or run the simulation yourself).

van_der_pol/particles.py

The explanation is simple: the periodic solution of the system is only orbitally stable; there is no stability along the limit cycle. So any disturbances that push the particles along the limit cycle will go unchecked. Eventually, the distribution will "mix" along the entire cycle. Perhaps surprisingly, this system that has a limit cycle stability when $w=0$ eventually reaches a stationary distribution (fixed point) in the master equation.

# Extended Example: The Rimless Wheel on Rough Terrain

Coming soon. Read the paper Byl08f and/or watch the video.

# Noise models for real robots/systems.

Sensor models. Beam model from probabilistic robotics. RGB-D dropouts.

Perception subsystem. Output of a perception system is not Gaussian noise, it's missed detections/drop-outs...

# Analysis (a preview)

Coming soon. Uncertainty quantification (e.g., linear guassian approximation; discretize then closed form solutions using the transition matrix....). Monte-Carlo. Particle sim/filter. Rare event simulation

L2-gain with dissipation inequalities. Finite-time verification with sums of squares.

# Dynamic Programming

In chapter 2, we spent some time thinking about the phase portrait of the simple pendulum, and concluded with a challenge: can we design a nonlinear controller to reshape the phase portrait, with a very modest amount of actuation, so that the upright fixed point becomes globally stable? With unbounded torque, feedback linearization solutions (e.g., invert gravity) can work well, but can also require an unnecessarily large amount of control effort. The energy-based swing-up control solutions presented for the acrobot and cart-pole systems are considerably more appealing, but required some cleverness and might not scale to more complicated systems. Here we investigate another approach to the problem, using computational optimal control to synthesize a feedback controller directly.

# Formulating control design as an optimization

In this chapter, we will introduce optimal control - a control design process using optimization. This approach is powerful for a number of reasons. First and foremost, it is very general - allowing us to specify the goal of control equally well for fully- or under-actuated, linear or nonlinear, deterministic or stochastic, and continuous or discrete systems. Second, it permits concise descriptions of potentially very complex desired behaviours, specifying the goal of control as an scalar objective (plus a list of constraints). Finally, and most importantly, optimal control is very amenable to numerical solutions. Bertsekas00a is a fantastic reference on this material.

The fundamental idea in optimal control is to formulate the goal of control as the long-term optimization of a scalar cost function. Let's introduce the basic concepts by considering a system that is even simpler than the simple pendulum.

# Optimal Control Formulations for the Double Integrator

Consider the double integrator system $$\ddot{q} = u, \quad |u| \le 1.$$ If you would like a mechanical analog of the system (I always do), then you can think about this as a unit mass brick moving along the x-axis on a frictionless surface, with a control input which provides a horizontal force, $u$. The task is to design a control system, $u = \pi(\bx,t)$, $\bx=[q,\dot{q}]^T$ to regulate this brick to $\bx = [0,0]^T$.

In order to formulate this control design problem using optimal control, we must define a scalar objective which scores the long-term performance of running each candidate control policy, $\pi(\bx,t)$, from each initial condition, $(\bx_0,t_0)$, and a list of constraints that must be satisfied. For the task of driving the double integrator to the origin, one could imagine a number of optimal control formulations which would accomplish the task, e.g.:

• Minimum time: $\min_\pi t_f,$ subject to $\bx(t_0) = \bx_0,$ $\bx(t_f) = {\bf 0}.$
• Quadratic cost: $\min_\pi \int_{t_0}^{\infty} \left[ \bx^T(t) {\bf Q} \bx(t) \right] dt,$ ${\bf Q}\succ0$.
where the first is a constrained optimization formulation which optimizes time, and the second accrues a penalty at every instance according to the distance that the state is away from the origin (in a metric space defined by the matrix ${\bf Q}$), and therefore encourages trajectories that go more directly towards the goal, possibly at the expense of requiring a longer time to reach the goal (in fact it will result in an exponential approach to the goal, where as the minimum-time formulation will arrive at the goal in finite time). Note that both optimization problems only have well-defined solutions if it is possible for the system to actually reach the origin, otherwise the minimum-time problem cannot satisfy the terminal constraint, and the integral in the quadratic cost would not converge to a finite value as time approaches infinity (fortunately the double integrator system is controllable, and therefore can be driven to the goal in finite time).

Note that the input limits, $|u|\le1$ are also required to make this problem well-posed; otherwise both optimizations would result in the optimal policy using infinite control input to approach the goal infinitely fast. Besides input limits, another common approach to limiting the control effort is to add an additional quadratic cost on the input (or "effort"), e.g. $\int \left[ \bu^T(t) {\bf R} \bu(t) \right] dt,$ ${\bf R}\succ0$. This could be added to either formulation above. We will examine many of these formulations in some details in the examples worked out at the end of this chapter.

Optimal control has a long history in robotics. For instance, there has been a great deal of work on the minimum-time problem for pick-and-place robotic manipulators, and the linear quadratic regulator (LQR) and linear quadratic regulator with Gaussian noise (LQG) have become essential tools for any practicing controls engineer. With increasingly powerful computers and algorithms, the popularity of numerical optimal control has grown at an incredible pace over the last few years.

# The minimum time problem for the double integrator

For more intuition, let's do an informal derivation of the solution to the minimum time problem for the double integrator with input constraints: \begin{align*} \minimize_{\pi} \quad & t_f\\ \subjto \quad & \bx(t_0) = \bx_0, \\ & \bx(t_f) = {\bf 0}, \\ & \ddot{q}(t) = u(t), \\ & |u| \le 1. \end{align*} What behavior would you expect an optimal controller exhibit?

Your intuition might tell you that the best thing that the brick can do, to reach the goal in minimum time with limited control input, is to accelerate maximally towards the goal until reaching a critical point, then hitting the brakes in order to come to a stop exactly at the goal. This would be called a bang-bang control policy; these are often optimal for systems with bounded input, and it is in fact optimal for the double integrator, although we will not prove it until we have developed more tools.

Let's work out the details of this bang-bang policy. First, we can figure out the states from which, when the brakes are fully applied, the system comes to rest precisely at the origin. Let's start with the case where $q(0) < 0$, and $\dot{q}(0)>0$, and "hitting the brakes" implies that $u=-1$ . Integrating the equations, we have \begin{gather*} \ddot{q}(t) = u = -1 \\\dot{q}(t) = \dot{q}(0) - t \\ q(t) = q(0) + \dot{q}(0) t - \frac{1}{2} t^2. \end{gather*} Substituting $t = \dot{q}(0) - \dot{q}$ into the solution give $\dot{q}$ reveals that the system orbits are parabolic arcs: $q = -\frac{1}{2} \dot{q}^2 + c_{-},$ with $c_{-} = q(0) + \frac{1}{2}\dot{q}^2(0)$.

Similarly, the solutions for $u=1$ are $q = \frac{1}{2} \dot{q}^2 + c_{+}$, with $c_{+}=q(0)-\frac{1}{2}\dot{q}^2(0)$.

Perhaps the most important of these orbits are the ones that pass directly through the origin (e.g., $c_{-}=0$). Following our initial logic, if the system is going slower than this $\dot{q}$ for any $q$, then the optimal thing to do is to slam on the accelerator ($u=-\text{sgn}(q)$). If it's going faster than the $\dot{q}$ that we've solved for, then still the best thing to do is to brake; but inevitably the system will overshoot the origin and have to come back. We can summarize this policy with: $u = \begin{cases} +1 & \text{if } (\dot{q} < 0 \text{ and } q \le \frac{1}{2} \dot{q}^2) \text{ or } (\dot{q}\ge 0 \text{ and } q < -\frac{1}{2} \dot{q}^2) \\ 0 & \text{if } q=0 \text{ and } \dot{q}=0 \\ -1 & \text{otherwise} \end{cases}$

and illustrate some of the optimal solution trajectories:

And for completeness, we can compute the optimal time to the goal by solving for the amount of time required to reach the switching surface plus the amount of time spent moving along the switching surface to the goal. With a little algebra, you will find that the time to goal, $T(\bx)$, is given by $T(\bx) = \begin{cases} 2\sqrt{\frac{1}{2}\dot{q}^2-q} - \dot{q} & \text{for } u=+1 \text{ regime}, \\ 0 & \text{for } u=0, \\ \dot{q} + 2\sqrt{\frac{1}{2}\dot{q}^2+q} & \text{for } u=-1, \end{cases}$ plotted here:

Notice that the function is continuous (though not smooth), even though the policy is discontinuous.

As we begin to develop theoretical and algorithmic tools for optimal control, we will see that some formulations are much easier to deal with than others. One important example is the dramatic simplification that can come from formulating objective functions using additive cost, because they often yield recursive solutions. In the additive cost formulation, the long-term "score" for a trajectory can be written as $$\int_0^T g(x(t),u(t)) dt,$$ where $g()$ is the instantaneous cost, and $T$ can be either a finite real number or $\infty$. We will call a problem specification with a finite $T$ a "finite-horizon" problem, and $T=\infty$ an "infinite-horizon" problem. Problems and solutions for infinite-horizon problems tend to be more elegant, but care is required to make sure that the integral converges for the optimal controller (typically by having an achievable goal that allows the robot to accrue zero-cost).

The quadratic cost function suggested in the double integrator example above is clearly written as an additive cost. At first glance, our minimum-time problem formulation doesn't appear to be of this form, but we actually can write it as an additive cost problem using an infinite horizon and the instantaneous cost $$g(x,u) = \begin{cases} 0 & \text{if } x=0, \\ 1 & \text{otherwise.} \end{cases}$$

We will examine a number of approaches to solving optimal control problems throughout the next few chapters. For the remainder of this chapter, we will focus on additive-cost problems and their solution via dynamic programming.

# Optimal control as graph search

For systems with continuous states and continuous actions, dynamic programming is a set of theoretical ideas surrounding additive cost optimal control problems. For systems with a finite, discrete set of states and a finite, discrete set of actions, dynamic programming also represents a set of very efficient numerical algorithm which can compute optimal feedback controllers. Many of you will have learned it before as a tool for graph search.

Imagine you have a directed graph with states (or nodes) $\{s_1,s_2,...\} \in S$ and "actions" associated with edges labeled as $\{a_1,a_2,...\} \in A$, as in the following trivial example:

Let us also assume that each edge has an associate weight or cost, using $g(s,a)$ to denote the cost of being in state $s$ and taking action $a$. Furthermore we will denote the transition "dynamics" using $s[n+1] = f(s[n],a[n]).$ For instance, in the graph above, $f(s_1,a_1) = s_2$.

There are many algorithms for finding (or approximating) the optimal path from a start to a goal on directed graphs. In dynamic programming, the key insight is that we can find the shortest path from every node by solving recursively for the optimal cost-to-go from every node to the goal. One such algorithm starts by initializing an estimate $\hat{J}^*=0$ for all $s_i$, then proceeds with an iterative algorithm which sets $$\hat{J}^*(s_i) \Leftarrow \min_{a \in A} \left[ g(s_i,a) + \hat{J}^*\left({f(s_i,a)}\right) \right]. \label{eq:value_update}$$ In software, $\hat{J}^*$ can be represented as a vector with dimension equal to the number of discrete states. This algorithm, appropriately known as value iteration, is guaranteed to converge to the optimal cost-to-go up to a constant factor, $\hat{J}^* \rightarrow J^* + c$ Bertsekas96, and in practice does so rapidly. Typically the update is done in batch -- e.g. the estimate is updated for all $i$ at once -- but the asynchronous version where states are updated one at a time is also known to converge, so long as every state is eventually updated infinitely often. Assuming the graph has a goal state with a zero-cost self-transition, then this cost-to-go function represents the weighted shortest distance to the goal.

Value iteration is an amazingly simple algorithm, but it accomplishes something quite amazing: it efficiently computes the long-term cost of an optimal policy from every state by iteratively evaluating the one-step cost. If we know the optimal cost-to-go, then it's easy to extract the optimal policy, $a = \pi^*(s)$: $$\pi^*(s_i) = \argmin_a \left[ g(s_i,a) + J^*\left( f(s_i,a) \right) \right]. \label{eq:policy_update}$$ It's a simple algorithm, but playing with an example can help our intuition.

# Grid World

Imagine a robot living in a grid (finite state) world. Wants to get to the goal location. Possibly has to negotiate cells with obstacles. Actions are to move up, down, left, right, or do nothing. Sutton98

Port matlab example to pydrake
figure/text for graph approximation of a continuous state space.

# Dynamic Programming for the Double Integrator

You can run value iteration for the double integrator (using barycentric interpolation to interpolate between nodes) in using:

double_integrator/value_iteration.py

Please do take some time to try different cost functions by editing the code yourself.

Let's take a minute to appreciate how amazing this is. Our solution to finding the optimal controller for the double integrator wasn't all that hard, but it required some mechanical intuition and solutions to differential equations. The resulting policy was non-trivial -- bang-bang control with a parabolic switching surface. The value iteration algorithm doesn't use any of this directly -- it's a simple algorithm for graph search. But remarkably, it can generate effectively the same policy with just a few moments of computation.

It's important to note that there are some differences between the computed policy and the optimal policy that we derived, due to discretization errors. We will ask you to explore these in the problems.

The real value of this numerical solution, however, is unlike our analytical solution for the double integrator, we can apply this same algorithm to any number of dynamical systems virtually without modification. Let's apply it now to the simple pendulum, which was intractable analytically.

# Dynamic Programming for the Simple Pendulum

You can run value iteration for the simple pendulum (using barycentric interpolation to interpolate between nodes) in using:

pendulum/value_iteration.py

Again, you can easily try different cost functions by editing the code yourself.

# Continuous dynamic programming

I find the graph search algorithm extremely satisfying as a first step, but also become quickly frustrated by the limitations of the discretization required to use it. In many cases, we can do better; coming up with algorithms which work more natively on continuous dynamical systems. We'll explore those extensions in this section.

# The Hamilton-Jacobi-Bellman Equation

It's important to understand that the value iteration equations, equations (\ref{eq:value_update}) and (\ref{eq:policy_update}), are more than just an algorithm. They are also sufficient conditions for optimality: if we can produce a $J^*$ and $\pi^*$ which satisfy these equations, then $\pi^*$ must be an optimal controller. There are an analogous set of conditions for the continuous systems. For a system $\dot{\bx} = f(\bx,\bu)$ and an infinite-horizon additive cost $\int_0^\infty g(\bx,\bu)dt$, we have: \begin{gather} 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right], \label{eq:HJB} \\ \pi^*(\bx) = \argmin_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right]. \end{gather} Equation \ref{eq:HJB} is known as the Hamilton-Jacobi-Bellman (HJB) equation. Bertsekas05 gives an informal derivation of these equations as the limit of a discrete-time approximation of the dynamics, and also gives the following sufficiency theorem:

# HJB Sufficiency Theorem

The following statement is adapted from Proposition 3.2.1 of Bertsekas05. Consider a system $\dot{\bx}=f(\bx,\bu)$ and an infinite-horizon additive cost $\int_0^\infty g(\bx,\bu)dt$. Suppose $J(\bx)$ is a solution to the HJB equation; that is $J$ is continuously differentiable in $\bx$ and is such that $0 = \min_{u\in U} \left[ g(\bx,\bu) + \pd{J}{\bx}f(\bx,\bu) \right],\quad \text{for all } \bx.$ Suppose also that $\pi^*(\bx)$ attains the minimum in the equation for all $\bx$. Further assume that the differential equation described by $f$ has a unique solution starting from any state $\bx$, that the control input trajectory, $\bu^*(t)$, obtained by evaluating $\pi^*$ along any solution is piecewise continuous as a function of $t$, and that there exists at least one fixed point, $\bx_{fp}$, with $f(\bx_{fp}, \pi^*(\bx_{fp})) = 0$ and $J^*(\bx_{fp}) = 0$. Then $J$ is equal to the optimal cost-to-go function, $J(\bx)=J^*(\bx)$ for all $\bx$, and the control trajectories $\bu^*(t)$ are optimal.

As a tool for verifying optimality, the HJB equations are actually surprisingly easy to work with: we can verify optimality for an infinite-horizon objective without doing any integration; we simply have to check a derivative condition on the optimal cost-to-go function $J^*$. Let's see this play out on the double integrator example.

# HJB for the Double Integrator

Consider the problem of regulating the double integrator (this time without input limits) to the origin using a quadratic cost: $$g(\bx,\bu) = q^2 + \dot{q}^2 + u^2.$$ I claim (without derivation) that the optimal controller for this objective is $$\pi(\bx) = -q - \sqrt{3}\dot{q}.$$ To convince you that this is indeed optimal, I have produced the following cost-to-go function: $$J(\bx) = \sqrt{3} q^2 + 2 q \dot{q} + \sqrt{3} \dot{q}^2.$$

Taking \begin{gather*} \pd{J}{q} = 2\sqrt{3} q + 2\dot{q}, \qquad \pd{J}{\dot{q}} = 2q + 2\sqrt{3}\dot{q}, \end{gather*} we can write \begin{align*} g(\bx,\bu) + \pd{J}{\bx}f(\bx,\bu) &= q^2 + \dot{q}^2 + u^2 + (2\sqrt{3} q + 2\dot{q}) \dot{q} + (2q + 2\sqrt{3}\dot{q}) u \end{align*} This is a convex quadratic function in $u$, so we can find the minimum with respect to $u$ by finding where the gradient with respect to $u$ evaluates to zero. $\pd{}{u} \left[ g(\bx,\bu) + \pd{J}{\bx} f(\bx,\bu) \right] = 2u + 2q + 2\sqrt{3}\dot{q}.$ Setting this equal to $0$ and solving for $u$ yields: $$u^* = -q - \sqrt{3} \dot{q},$$ thereby confirming that our policy $\pi$ is in fact the minimizer. Substituting $u^*$ back into the HJB reveals that the right side does in fact simplify to zero. I hope you are convinced!

Note that evaluating the HJB for the time-to-go of the minimum-time problem for the double integrator will also reveal that the HJB is satisfied wherever that gradient is well-defined. This is certainly mounting evidence in support of our bang-bang policy being optimal, but since $\pd{J}{\bx}$ is not defined everywhere, it does not actually satisfy the requirements of the sufficiency theorem as stated above.

# Solving for the minimizing control

We still face a few barriers to actually using the HJB in an algorithm. The first barrier is the minimization over $u$. When the action set was discrete, as in the graph search version, we could evaluate the one-step cost plus cost-to-go for every possible action, and then simply take the best. For continuous action spaces, in general we cannot rely on the strategy of evaluating a finite number of possible $\bu$'s to find the minimizer.

All is not lost. In the quadratic cost double integrator example above, we were able to solve explicitly for the minimizing $\bu$ in terms of the cost-to-go. It turns out that this strategy will actually work for a number of the problems we're interested in, even when the system (which we are given) or cost function (which we are free to pick, but which should be expressive) gets more complicated.

Recall that I've already tried to convinced you that a majority of the systems of interest are control affine, e.g. I can write $f(\bx,\bu) = f_1(\bx) + f_2(\bx)\bu.$ We can make another dramatic simplification by restricting ourselves to instantaneous cost functions of the form $g(\bx,\bu) = g_1(\bx) + \bu^T {\bf R} \bu, \qquad {\bf R}={\bf R}^T \succ 0.$ In my view, this is not very restrictive - many of the cost functions that I find myself choosing to write down can be expressed in this form. Given these assumptions, we can write the HJB as $0 = \min_{\bu} \left[ g_1(\bx) + \bu^T {\bf R} \bu + \pd{J}{\bx} \left[ f_1(\bx) + f_2(\bx)\bu \right]\right].$ Since this is a positive quadratic function in $\bu$, if the system does not have any constraints on $\bu$, then we can solve in closed-form for the minimizing $\bu$ by taking the gradient of the right-hand side: $\pd{}{\bu} = 2\bu^T {\bf R} + \pd{J}{\bx} f_2(\bx) = 0,$ and setting it equal to zero to obtain $\bu^* = -\frac{1}{2}{\bf R}^{-1}f_2^T(\bx) \pd{J}{\bx}^T.$ If there are linear constraints on the input, such as torque limits, then more generally this could be solved (at any particular $\bx$) as a quadratic program.

What happens in the case where our system is not control affine or if we really do need to specify an instantaneous cost function on $\bu$ that is not simply quadratic? If the goal is to produce an iterative algorithm, like value iteration, then one common approach is to make a (positive-definite) quadratic approximation in $\bu$ of the HJB, and updating that approximation on every iteration of the algorithm. This broad approach is often referred to as differential dynamic programming (c.f. Jacobson70).

# Numerical solutions for $J^*$

The other major barrier to using the HJB in a value iteration algorithm is that the estimated optimal cost-to-go function, $\hat{J}^*$, must somehow be represented with a finite set of numbers, but we don't yet know anything about the potential form it must take. In fact, knowing the time-to-goal solution for minimum-time problem with the double integrator, we see that this function might need to be non-smooth for even very simple dynamics and objectives.

One natural way to parameterize $\hat{J}^*$ -- a scalar valued-function defined over the state space -- is to define the values on a mesh. This approach then admits algorithms with close ties to the relatively very advanced numerical methods used to solve other partial differential equations (PDEs), such as the ones that appear in finite element modeling or fluid dynamics. One important difference, however, is that our PDE lives in the dimension of the state space, while many of the mesh representations from the other disciplines are optimized for two or three dimensional space. Also, our PDE may have discontinuities (or at least discontinuous gradients) at locations in the state space which are not known apriori.

A slightly more general view of the problem would describe the mesh (and the associated interpolation functions) as just one form of representations for function approximation. Using a deep convolutional neural network to represent the cost-to-go would also fall under the domain of function approximation, perhaps representing the other extreme in terms of complexity; using deep networks in approximate dynamic programming is now called deep reinforcement learning. We will discuss it more later in the book.

(see Appendix C for a brief background on function approximation)

# Representing the cost-to-go with function approximation

If we approximate $J^*$ with a finitely-parameterized function $\hat{J}_\balpha^*$, with parameter vector $\balpha$, then this immediately raises many important questions:

• What if the true cost-to-go function does not live in the prescribed function class; e.g., there does not exist an $\balpha$ which satisfies the sufficiency conditions for all $\bx$? (This seems very likely to be the case.)
• What update should we apply in the iterative algorithm?
• What can we say about it's convergence?
add quote from neuro-dynamic programming preface

In general, many of these questions will have to go unanswered. But there are some answers available for the special case of linear function approximators. A linear function approximator takes the form: $\hat{J}^*_\balpha(\bx) = \sum_i \alpha_i \psi_i(\bx) = \bpsi^T(\bx) \balpha,$ where $\bpsi(\bx)$ is a vector of potentially nonlinear features. Common examples of features include polynomials, radial basis functions, or most interpolation schemes used on a mesh. The distinguishing feature of a linear function approximator is the ability to exactly solve for $\balpha$ in order to represent a desired function optimally, in a least-squares sense. For instance, if we have a desired set of $K$ input-output points, $(\bx_k,J^d_k)$, then we can write down the closed-form solution to $\minimize_\balpha \sum_k \left(\hat{J}_\balpha(\bx_k) - J^d_k \right)^2.$ Using the least squares solution in a value iteration update is sometimes referred to as fitted value iteration, where $\bx_k$ are some number of samples taken from the continuous space and the iteration becomes \begin{gather*} J^d_k = \min_\bu \left[ g(\bx_k,\bu) + \hat{J}_\alpha\left({f(\bx_k,\bu)}\right) \right], \\ \balpha \Leftarrow \argmin_\balpha \sum_k \left(\hat{J}_\balpha(\bx_k) - J^d_k \right)^2. \end{gather*} For linear function approximators, this is simply: \begin{gather*} \balpha \Leftarrow \begin{bmatrix} \bpsi^T(\bx_1) \\ \vdots \\ \bpsi^T(\bx_K)\end{bmatrix}^+ \begin{bmatrix} J^d_1 \\ \vdots \\ J^d_K \end{bmatrix}, \end{gather*} where the $^+$ notation refers to a Moore-Penrose pseudoinverse. Remarkably, for linear function approximators, this update is still known to converge to the globally optimal $\balpha^*$.

# Representing the cost-to-go on a mesh

Imagine that we use a mesh to approximate the cost-to-go function over that state space with $K$ mesh points $\bx_k$. We would like to perform the value iteration update: $$\forall k, \hat{J}^*(\bx_k) \Leftarrow \min_\bu \left[ g(\bx_k,\bu) + \hat{J}^*\left({f(\bx_k,\bu)}\right) \right], \label{eq:mesh_value_iteration}$$ but must deal with the fact that $f(\bx_k,\bu)$ might not result in a next state that is directly at a mesh point. Most interpolation schemes for a mesh can be written as some weighted combination of the values at nearby mesh points, e.g. $\hat{J}^*(\bx) = \sum_i \beta_i(\bx) \hat{J}^*(\bx_i), \quad \sum_i \beta_i = 1$ with $\beta_i$ the relative weight of the $i$th mesh point. In we have implemented barycentric interpolationMunos98. Taking $\alpha_i = \hat{J}^*(\bx_i)$, the cost-to-go estimate at mesh point $i$, we can see that this is precisely an important special case of fitted value iteration with linear function approximation. Furthermore, assuming $\beta_i(\bx_i) = 1,$ (e.g., the only point contributing to the interpolation at a mesh point is the value at the mesh point itself), the update in Eq. (\ref{eq:mesh_value_iteration}) is precisely the least-squares update (and it achieves zero error).

try just running snopt on quartic objective

# Stochastic Optimal Control

One of the most amazing features of the dynamic programming, additive cost approach to optimal control is the relative ease with which it extends to optimizing stochastic systems.

# Graph search -- stochastic shortest path problems

For discrete systems, we can generalize our dynamics on a graph by adding action-dependent transition probabilities to the edges. This new dynamical system is known as a Markov Decision Process (MDP), and we write the dynamics completely in terms of the transition probabilities $\Pr(s[n+1] = s' | s[n] = s, a[n] = a).$ For discrete systems, this is simply a big lookup table. The cost that we incur for any execution of system is now a random variable, and so we formulate the goal of control as optimizing the expected cost, e.g. $$J^*(s[0]) = \min_{a[\cdot]} E \left[ \sum_{n=0}^\infty g(s[n],a[n]) \right]. \label{eq:stochastic_dp_optimality_cond}$$ Note that there are many other potential objectives, such as minimizing the worst-case error, but the expected cost is special because it preserves the dynamic programming recursion: $J^*(s) = \min_a E \left[ g(s,a) + J^*(s')\right] = \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) J^*(s') \right].$ Remarkably, if we use these optimality conditions to construct our value iteration algorithm $\hat{J}(s) \Leftarrow \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) \hat{J}(s') \right],$ then this algorithm has the same strong convergence guarantees of its counterpart for deterministic systems. And it is essentially no more expensive to compute!

# Stochastic interpretation of deterministic, continuous-state value iteration

There is a particularly cute observation to be made here. Let's assume that we have discrete control inputs and discrete-time dynamics, but a continuous state space. Recall the fitted value iteration on a mesh algorithm described above. In fact, the resulting update is exactly the same as if we treated the system as a discrete state MDP with $\beta_i$ representing the probability of transitioning to state $x_i$! This sheds some light on the impact of discretization on the solutions -- discretization error here will cause a sort of diffusion corresponding to the probability of spreading across neighboring nodes.

# Extensions

add section on extensions. discounting. finite-horizon / time-varying dynamics or cost.

# Linear Programming Approach

For discrete MDPs, value iteration is a magical algorithm because it is simple, but known to converge to the global optimal, $J^*$. However, other important algorithms are known; one of the most important is a solution approach using linear programming. This formulation provides an alternative view, but may also be more generalizable and even more efficient for some instances of the problem.

Recall the optimality conditions from Eq. (\ref{eq:stochastic_dp_optimality_cond}). If we describe the cost-to-go function as a vector, $J_i = J(s_i)$, then these optimality conditions can be rewritten in vector form as $$\bJ = \min_a \left[ \bg(a) + \bT(a) \bJ \right], \label{eq:vector_stochastic_dp}$$ where $g_i(a) = g(s_i,a)$ is the cost vector, and $T_{i,j}(a) = \Pr(s_j|s_i,a)$ is the transition probability matrix. Let us take $\bJ$ as the vector of decision variables, and replace Eq. (\ref{eq:vector_stochastic_dp}) with the constraints: $$\forall a, \bJ \le \bg(a) + \bT(a) \bJ.$$ Clearly, for finite $a$, this is finite list of linear constraints, and for any vector $\bJ$ satisfying these constraints we have $\bJ \le \bJ^*$ (elementwise). Now write the linear program: \begin{gather*} \maximize_\bJ \quad \bc^T \bJ, \\ \subjto \quad \forall a, \bJ \le \bg(a) + \bT(a) \bJ, \end{gather*} where $c$ is any positive vector. The solution to this problem is $\bJ = \bJ^*$.

Perhaps even more interesting is that this approach can be generalized to linear function approximators. Taking a vector form of my notation above, now we have a matrix of features with $\bPsi_{i,j} = \psi^T_j(\bx_i)$ and we can write the LP \begin{gather} \maximize_\balpha \quad \bc^T \bPsi \balpha, \\ \subjto \quad \forall a, \bPsi \balpha \le \bg(a) + \bT(a) \bPsi \balpha. \end{gather} This time the solution is not necessarily optimal, because $\bPsi \balpha^*$ only approximates $\bJ^*$, and the relative values of the elements of $\bc$ (called the "state-relevance weights") can determine the relative tightness of the approximation for different features Farias02.

While solving the dynamic programming problem for continuous systems is very hard in general, there are a few very important special cases where the solutions are very accessible. Most of these involve variants on the case of linear dynamics and quadratic cost. The simplest case, called the linear quadratic regulator (LQR), is formulated as stabilizing a time-invariant linear system to the origin.

The linear quadratic regulator is likely the most important and influential result in optimal control theory to date. In this chapter we will derive the basic algorithm and a variety of useful extensions.

# Basic Derivation

Consider a linear time-invariant system in state-space form, $$\dot{\bx} = {\bf A}\bx + \bB\bu,$$ with the infinite-horizon cost function given by $$J = \int_0^\infty \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu \right] dt, \quad {\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0.$$ Our goal is to find the optimal cost-to-go function $J^*(\bx)$ which satisfies the HJB: $$\forall \bx, \quad 0 = \min_\bu \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu + \pd{J^*}{\bx} \left( {\bf A}\bx + \bB\bu \right) \right].$$

There is one important step here -- it is well known that for this problem the optimal cost-to-go function is quadratic. This is easy to verify. Let us choose the form: $$J^*(\bx) = \bx^T {\bf S} \bx, \quad {\bf S} = {\bf S}^T \succeq 0.$$ The gradient of this function is $$\pd{J^*}{\bx} = 2 \bx^T {\bf S}.$$

Since we have guaranteed, by construction, that the terms inside the $\min$ are quadratic and convex (because ${\bf R} \succ 0$), we can take the minimum explicitly by finding the solution where the gradient of those terms vanishes: $$\pd{}{\bu} = 2\bu^T {\bf R} + 2 \bx^T {\bf S} \bB = 0.$$ This yields the optimal policy $$\bu^* = \pi^*(\bx) = - {\bf R}^{-1} \bB^T {\bf S} \bx = - \bK \bx.$$

Inserting this back into the HJB and simplifying yields $$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}\bB^T{\bf S} + 2{\bf SA} \right]\bx.$$ All of the terms here are symmetric except for the $2{\bf SA}$, but since $\bx^T{\bf SA}\bx = \bx^T{\bf A}^T{\bf S}\bx$, we can write $$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}\bB^T{\bf S} + {\bf SA} + {\bf A}^T{\bf S} \right]\bx.$$ and since this condition must hold for all $\bx$, it is sufficient to consider the matrix equation $$0 = {\bf S} {\bf A} + {\bf A}^T {\bf S} - {\bf S} \bB {\bf R}^{-1} \bB^T {\bf S} + {\bf Q}.$$ This extremely important equation is a version of the algebraic Riccati equation. Note that it is quadratic in ${\bf S}$, making its solution non-trivial, but it is well known that the equation has a single positive-definite solution if and only if the system is controllable and there are good numerical methods for finding that solution, even in high-dimensional problems. Both the optimal policy and optimal cost-to-go function are available from by calling  (K,S) = LinearQuadraticRegulator(A,B,Q,R).

If the appearance of the quadratic form of the cost-to-go seemed mysterious, consider that the solution to the linear system $\dot\bx = (\bA - \bB\bK)\bx$ takes the form $\bx(t) = e^{(\bA-\bB\bK)t}\bx(0)$, and try inserting this back into the integral cost function.

It is worth examining the form of the optimal policy more closely. Since the value function represents cost-to-go, it would be sensible to move down this landscape as quickly as possible. Indeed, $-{\bf S}\bx$ is in the direction of steepest descent of the value function. However, not all directions are possible to achieve in state-space. $-\bB^T {\bf S} \bx$ represents precisely the projection of the steepest descent onto the control space, and is the steepest descent achievable with the control inputs $\bu$. Finally, the pre-scaling by the matrix ${\bf R}^{-1}$ biases the direction of descent to account for relative weightings that we have placed on the different control inputs. Note that although this interpretation is straight-forward, the slope that we are descending (in the value function, ${\bf S}$) is a complicated function of the dynamics and cost.

# Local stabilization of nonlinear systems

LQR is extremely relevant to us even though our primary interest is in nonlinear dynamics, because it can provide a local approximation of the optimal control solution for the nonlinear system. Given the nonlinear system $\dot{\bx} = f(\bx,\bu)$, and a stabilizable operating point, $(\bx_0,\bu_0)$, with $f(\bx_0,\bu_0) = 0.$ We can define a relative coordinate system $$\bar\bx = \bx - \bx_0, \quad \bar\bu = \bu - \bu_0,$$ and observe that $$\dot{\bar\bx} = \dot{\bx} = f(\bx,\bu),$$ which we can approximate with a first-order Taylor expansion to $$\dot{\bar\bx} \approx f(\bx_0,\bu_0) + \pd{f(\bx_0,\bu_0)}{\bx} (\bx - \bx_0) + \pd{f(\bx_0,\bu_0)}{\bu} (\bu - \bu_0) = {\bf A}\bar{\bx} + \bB\bar\bu.$$

Similarly, we can define a quadratic cost function in the error coordinates, or take a (positive-definite) second-order approximation of a nonlinear cost function about the operating point (linear and constant terms in the cost function can be easily incorporated into the derivation by parameterizing a full quadratic form for $J^*$, as seen in the Linear Quadratic Tracking derivation below).

The resulting controller takes the form $\bar\bu^* = -{\bf K}\bar\bx$ or $$\bu^* = \bu_0 - {\bf K} (\bx - \bx_0).$$ For convenience, allows you to call controller = LinearQuadraticRegulator(system, context, Q, R) on most dynamical systems (including block diagrams built up of many subsystems); it will perform the linearization for you.

# Finite-horizon formulations

Recall that the cost-to-go for finite-horizon problems is time-dependent, and therefore the HJB sufficiency condition requires an additional term for $\pd{J^*}{t}$. $$\forall \bx, \forall t\in[t_0,t_f],\quad 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) + \pd{J^*}{t} \right].$$

# Finite-horizon LQR

Consider systems governed by an LTI state-space equation of the form $$\dot{\bx} = {\bf A}\bx + \bB\bu,$$ and a finite-horizon cost function with \begin{gather*} h(\bx) = \bx^T {\bf Q}_f \bx,\quad {\bf Q}_f = {\bf Q}_f^T \ge {\bf 0} \\ g(\bx,\bu,t) = \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu, \quad {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*} Writing the HJB, we have $$0 = \min_\bu \left[\bx^T {\bf Q} \bx + \bu^T {\bf R}\bu + \pd{J^*}{\bx} \left({\bf A} \bx + \bB \bu \right) + \pd{J^*}{t} \right].$$ Due to the positive definite quadratic form on $\bu$, we can find the minimum by setting the gradient to zero: \begin{gather*} \pd{}{\bu} = 2 \bu^T {\bf R} + \pd{J^*}{\bx} \bB = 0 \\ \bu^* = \pi^*(\bx,t) = - \frac{1}{2}{\bf R}^{-1} \bB^T \pd{J^*}{\bx}^T \end{gather*} In order to proceed, we need to investigate a particular form for the cost-to-go function, $J^*(\bx,t)$. Let's try a solution of the form: $$J^*(\bx,t) = \bx^T {\bf S}(t) \bx, \quad {\bf S}(t) = {\bf S}^T(t) > {\bf 0}.$$ In this case we have $$\pd{J^*}{\bx} = 2 \bx^T {\bf S}(t), \quad \pd{J*}{t} = \bx^T \dot{\bf S}(t) \bx,$$ and therefore \begin{gather*} \bu^* = \pi^*(\bx,t) = - {\bf R}^{-1} \bB^T {\bf S}(t) \bx \\ 0 = \bx^T \left[ {\bf Q} - {\bf S}(t) \bB {\bf R}^{-1} \bB^T {\bf S}(t) + {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) + \dot{\bf S}(t) \right] \bx.\end{gather*} Therefore, ${\bf S}(t)$ must satisfy the condition (known as the continuous-time differential Riccati equation): $$-\dot{\bf S}(t) = {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) - {\bf S}(t) \bB {\bf R}^{-1} \bB^T {\bf S}(t) + {\bf Q},$$ and the terminal condition $${\bf S}(T) = {\bf Q}_f.$$ Since we were able to satisfy the HJB with the minimizing policy, we have met the sufficiency condition, and have found the optimal policy and optimal cost-to-go function.

Note that the infinite-horizon LQR solution described in the prequel is exactly the steady-state solution of this equation, defined by $\dot{\bf S}(t) = 0$. Indeed, for controllable systems this equation is stable (backwards in time), and as expected the finite-horizon solution converges on the infinite-horizon solution as the horizon time limits to infinity.

# Time-varying LQR

The derivation above holds even if the dynamics are given by $$\dot{\bx} = {\bf A}(t)\bx + {\bf B(t)}\bu.$$ Similarly, the cost functions ${\bf Q}$ and ${\bf R}$ can also be time-varying. This is quite surprising, as the class of time-varying linear systems is a quite general class of systems. It requires essentially no assumptions on how the time-dependence enters, except perhaps that if ${\bf A}$ or $\bB$ is discontinuous in time then one would have to use the proper techniques to accurately integrate the differential equation. As we will see in the chapter on trajectory optimization, one of the most powerful applications involves linearizing around a nominal trajectory of a nonlinear system and using LQR to provide a trajectory controller.

For completeness, we consider a slightly more general form of the linear quadratic regulator. The standard LQR derivation attempts to drive the system to zero. Consider now the problem: \begin{gather*} \dot{\bx} = {\bf A}\bx + \bB\bu \\ h(\bx) = (\bx - \bx^d(T))^T {\bf Q}_f (\bx - \bx^d(T)), \quad {\bf Q}_f = {\bf Q}_f^T \ge 0 \\ g(\bx,\bu,t) = (\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)),\\ {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*} Now, guess a solution \begin{gather*} J^*(\bx,t) = \bx^T {\bf S_2}(t) \bx + \bx^T {\bf s_1}(t) + s_0(t), \quad {\bf S_2}(t) = {\bf S_2}^T(t) > {\bf 0}. \end{gather*} In this case, we have $$\pd{J^*}{\bx} = 2 \bx^T {\bf S_2}(t) + {\bf s_1}^T(t),\quad \pd{J^*}{t} = \bx^T \dot{\bf S_2}(t) \bx + \bx^T \dot{\bf s_1}(t) + \dot{s_0}(t).$$ Using the HJB, $$0 = \min_\bu \left[(\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)) + \pd{J^*}{\bx} \left({\bf A} \bx + \bB \bu \right) + \pd{J^*}{t} \right],$$ we have \begin{gather*} \pd{}{\bu} = 2 (\bu - \bu^d(t))^T{\bf R} + (2\bx^T{\bf S_2}(t) + {\bf s_1}^T(t))\bB = 0,\\ \bu^*(t) = \bu^d(t) - {\bf R}^{-1} \bB^T\left[{\bf S_2}(t)\bx + \frac{1}{2}{\bf s_1}(t)\right] \end{gather*} The HJB can be satisfied by integrating backwards \begin{align*} -\dot{\bf S_2}(t) =& {\bf Q} - {\bf S_2}(t) \bB {\bf R}^{-1} {\bf B}^T {\bf S_2}(t) + {\bf S_2}(t) {\bf A} + {\bf A}^T {\bf S_2}(t)\\ -\dot{\bf s_1}(t) =& -2 {\bf Q} \bx^d(t) + \left[{\bf A}^T - {\bf S}_2 \bB {\bf R}^{-1} \bB^T \right]{\bf s_1}(t) + 2{\bf S_2}(t) \bB \bu^d(t)\\ -\dot{s_0}(t) =& \bx^d(t)^T {\bf Q} \bx^d(t) - \frac{1}{4} {\bf s_1}^T(t) \bB {\bf R}^{-1} \bB^T {\bf s_1}(t) + {\bf s_1}(t)^T \bB \bu^d(t), \end{align*} from the final conditions \begin{align*} {\bf S_2}(T) =& {\bf Q}_f\\ {\bf s_1}(T) =& -2 {\bf Q}_f \bx^d(T) \\ s_0(T) =& \left[\bx^d(T)\right]^T {\bf Q}_f \left[ \bx^d(T) \right]. \end{align*} Notice that the solution for ${\bf S_2}$ is the same as the simpler LQR derivation, and is symmetric (as we assumed). Note also that $s_0(t)$ has no effect on control (even indirectly), and so can often be ignored.

A quick observation about the quadratic form, which might be helpful in debugging. We know that $J(x,t)$ must be uniformly positive. This is true iff ${\bf S}_2>0$ and $s_0 > \frac{1}{4} {\bf s}_1^T {\bf S}_2^{-1} {\bf s}_1$, which comes from evaluating the function at $x_{min}$ defined by $\pd{}{x} = 0$.

test this on an example, get my notation consistent (s(t)^T vs s^T(t), etc.

# Linear Final Boundary Value Problems

The finite-horizon LQR formulation can be used to impose a strict final boundary value condition by setting an infinite ${\bf Q}_f$. However, integrating the Riccati equation backwards from an infinite initial condition isn't very practical. To get around this, let us consider solving for ${\bf P}(t) = {\bf S}(t)^{-1}$. Using the matrix relation $\frac{d {\bf S}^{-1}}{dt} = - {\bf S}^{-1} \frac{d {\bf S}}{dt} {\bf S}^{-1}$, we have: $$-\dot{\bf P}(t) = - {\bf P}(t){\bf Q P}(t) + {\bf B R}^{-1} \bB - {\bf A P}(t) - {\bf P}(t){\bf A}^T,$$ with the final conditions $${\bf P}(T) = 0.$$ This Riccati equation can be integrated backwards in time for a solution.

It is very interesting, and powerful, to note that, if one chooses ${\bf Q} = 0$, therefore imposing no position cost on the trajectory before time $T$, then this inverse Riccati equation becomes a linear ODE which can be solved explicitly. % add explicit solution here These relationships are used in the derivation of the controllability Grammian, but here we use them to design a feedback controller.

# Lyapunov Analysis

Optimal control provides a powerful framework for formulating control problems using the language of optimization. But solving optimal control problems for nonlinear systems is hard! In many cases, we don't really care about finding the optimal controller, but would be satisfied with any controller that is guaranteed to accomplish the specified task. In many cases, we still formulate these problems using computational tools from optimization, and in this chapter we'll learn about tools that can provide guaranteed control solutions for systems that are beyond the complexity for which we can find the optimal feedback.

There are many excellent books on Lyapunov analysis; for instance Slotine90 is an excellent and very readable reference and Khalil01 can provide a rigorous treatment. In this chapter I will summarize (without proof) some of the key theorems from Lyapunov analysis, but then will also introduce a number of numerical algorithms... many of which are new enough that they have not yet appeared in any mainstream textbooks.

# Stability of the Damped Pendulum

Recall that the equations of motion of the damped simple pendulum are given by $ml^2 \ddot{\theta} + mgl\sin\theta = -b\dot{\theta},$ which I've written with the damping on the right-hand side to remind us that it is an external torque that we've modeled.

These equations represent a simple second-order differential equation; in chapter 2 we discussed at some length what was known about the solutions to this differential equation--even without damping the equation for $\theta(t)$ as a function of $\theta(0)$, $\dot{\theta}(0)$, involves elliptic integrals of the first kind, and with damping we don't even have that. Since we couldn't provide a solution analytically, in chapter 2 we resorted to a graphical analysis, and confirmed the intuition that there are fixed points in the system (at $\theta = k\pi$ for every integer $k$) and that the fixed points at $\theta = 2\pi k$ are asymptotically stable with a large basin of attraction. The graphical analysis gave us this intuition, but can we actually prove this stability property? In a way that might also work for much more complicated systems?

One route forward was from looking at the total system energy (kinetic + potential), which we can write down: $E(\theta,\dot{\theta}) = \frac{1}{2} ml^2\dot{\theta}^2 - mgl \cos\theta.$ Recall that the contours of this energy function are the orbits of the undamped pendulum.

A natural route to proving the stability of the downward fixed points is by arguing that energy decreases for the damped pendulum (with $b>0$) and so the system will eventually come to rest at the minimum energy, $E = -mgl$, which happens at $\theta=2\pi k$. Let's make that argument slightly more precise.

Evaluating the time derivative of the energy reveals $\frac{d}{dt} E = - b\dot\theta^2 \le 0.$ This is sufficient to demonstrate that the energy will never increase, but it doesn't actually prove that the energy will converge to the minimum. To take that extra step, we must observe that for most of the states with $\dot{E} = 0$, which for $b>0$ is the manifold where $\dot\theta=0$, the system can't actually remain in that state, but will rather pass through and enter another region where it continues to lose energy. In fact, the fixed points are the only subset of this $\dot{E}=0$ manifold where the system can stay on the manifold. Therefore, unless the system is at a fixed point, it will continue to dissipate energy. And therefore we can conclude that as $t\rightarrow \infty$, the system will indeed come to rest at a fixed point (though it could be any fixed point with an energy less than or equal to the initial energy in the system, $E(0)$).

This is an important example. It demonstrated that we could use a relatively simple function, the system energy, to describe something about the long-term dynamics of the pendulum even though the actual trajectories of the system are (analytically) very complex. It also demonstrated one of the subtleties of using an energy-like function that is non-increasing (instead of strictly decreasing) to prove asymptotic stability.

Lyapunov functions generalize this notion of an energy function to more general systems, which might not be stable in the sense of some mechanical energy. If I can find any positive function, call it $V(\bx)$, that gets smaller over time as the system evolves, then I can potentially use $V$ to make a statement about the long-term behavior of the system. $V$ is called a Lyapunov function.

Recall that we defined three separate notions for stability of a fixed-point of a nonlinear system: stability i.s.L., asymptotic stability, and exponential stability. We can use Lyapunov functions to demonstrate each of these, in turn.

# Lyapunov's Direct Method

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, and for some region ${\cal B}$ around the origin (specifically an open subset of $\mathbf{R}^n$ containing the origin), if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) > 0, \forall \bx \in {\cal B} \ne 0 \quad V(0) = 0, \text{ and} \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le 0, \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0, \end{gather*} then the origin $(\bx = 0)$ is stable in the sense of Lyapunov (i.s.L.).

If, additionally, we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) < 0, \forall \bx \in {\cal B} \ne 0,$$ then the origin is (locally) asymptotically stable. And if we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le -\alpha V(x), \forall \bx \in {\cal B} \ne 0,$$ for some $\alpha>0$, then the origin is (locally) exponentially stable.

Note that for the sequel we will use the notation $V \succ 0$ to denote a positive-definite function, meaning that $V(0)=0$ and $V(\bx)>0$ for all $\bx\ne0$ (and also $V \succeq 0$ for positive semi-definite, $V \prec 0$ for negative-definite functions).

The intuition here is exactly the same as for the energy argument we made in the pendulum example: since $\dot{V}(x)$ is always zero or negative, the value of $V(x)$ will only get smaller (or stay the same) as time progresses. Inside the subset ${\cal B}$, for every $\epsilon$-ball, I can choose a $\delta$ such that $|x(0)|^2 < \delta \Rightarrow |x(t)|^2 < \epsilon, \forall t$ by choosing $\delta$ sufficiently small so that the sublevel-set of $V(x)$ for the largest value that $V(x)$ takes in the $\delta$ ball is completely contained in the $\epsilon$ ball. Since the value of $V$ can only get smaller (or stay constant) in time, this gives stability i.s.L.. If $\dot{V}$ is strictly negative away from the origin, then it must eventually get to the origin (asymptotic stability). The exponential condition is implied by the fact that $\forall t>0, V(\bx(t)) \le V(\bx(0)) e^{-\alpha t}$.

Notice that the system analyzed above, $\dot{\bx}=f(\bx)$, did not have any control inputs. Therefore, Lyapunov analysis is used to study either the passive dynamics of a system or the dynamics of a closed-loop system (system + control in feedback). We will see generalizations of the Lyapunov functions to input-output systems later in the text.

# Global Stability

The notion of a fixed point being stable i.s.L. is inherently a local notion of stability (defined with $\epsilon$- and $\delta$- balls around the origin), but the notions of asymptotic and exponential stability can be applied globally. The Lyapunov theorems work for this case, too, with only minor modification.

# Lyapunov analysis for global stability

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) \succ 0, \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \prec 0, \text{ and} \\ V(\bx) \rightarrow \infty \text{ whenever } ||x||\rightarrow \infty,\end{gather*} then the origin $(\bx = 0)$ is globally asymptotically stable (G.A.S.).

If additionally we have that $$\dot{V}(\bx) \preceq -\alpha V(\bx),$$ for some $\alpha>0$, then the origin is globally exponentially stable.

The new condition, on the behavior as $||\bx|| \rightarrow \infty$ is known as "radially unbounded", and is required to make sure that trajectories cannot diverge to infinity even as $V$ decreases; it is only required for global stability analysis.

# LaSalle's Invariance Principle

Perhaps you noticed the disconnect between the statement above and the argument that we made for the stability of the pendulum. In the pendulum example, using the mechanical energy resulted in a Lyapunov function that was only negative semi-definite, but we eventually argued that the fixed points were asymptotically stable. That took a little extra work, involving an argument about the fact that the fixed points were the only place that the system could stay with $\dot{E}=0$; every other state with $\dot{E}=0$ was only transient. We can formalize this idea for the more general Lyapunov function statements--it is known as LaSalle's Theorem.

# LaSalle's Theorem

Given a system $\dot{\bx} = f(\bx)$ with $f$ continuous. If we can produce a scalar function $V(\bx)$ with continuous derivatives for which over an open-subset ${\cal B}\in \mathbf{R}^n$ we have $$V(\bx) \succ 0,\quad \dot{V}(\bx) \preceq 0,$$ and $V(\bx)\rightarrow \infty$ as $||\bx||\rightarrow \infty$, then $\bx$ will converge to the largest invariant set where $\dot{V}(\bx) = 0$.

To be clear, an invariant set, ${\cal G}$, of the dynamical system is a set for which $\bx(0)\in{\cal G} \Rightarrow \forall t>0, \bx(t) \in {\cal G}$. In other words, once you enter the set you never leave. The "largest invariant set" need not be connected; in fact for the pendulum example each fixed point is an invariant set, so the largest invariant set is the union of all the fixed points of the system.

Finding a Lyapunov function which $\dot{V} \prec 0$ is more difficult than finding one that has $\dot{V} \preceq 0$. LaSalle's theorem gives us the ability to make a statement about asymptotic stability even in this case. In the pendulum example, every state with $\dot\theta=0$ had $\dot{E}=0$, but only the fixed points are in the largest invariant set.

# Relationship to the Hamilton-Jacobi-Bellman equations

At this point, you might be wondering if there is any relationship between Lyapunov functions and the cost-to-go functions that we discussed in the context of dynamic programming. After all, the cost-to-go functions also captured a great deal about the long-term dynamics of the system in a scalar function. We can see the connection if we re-examine the HJB equation $0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu). \right]$Let's imagine that we can solve for the optimizing $\bu^*(\bx)$, then we are left with $0 = g(\bx,\bu^*) + \pd{J^*}{\bx}f(\bx,\bu^*)$ or simply $\dot{J}^*(\bx) = -g(\bx,\bu^*) \qquad \text{vs} \qquad \dot{V}(\bx) \preceq 0.$ In other words, in optimal control we must find a cost-to-go function which matches this gradient for every $\bx$; that's very difficult and involves solving a potentially high-dimensional partial differential equation. By contrast, Lyapunov analysis is asking for much less - any function which is going downhill (at any rate) for all states. This can be much easier, for theoretical work, but also for our numerical algorithms. Also note that if we do manage to find the optimal cost-to-go, $J^*(\bx)$, then it can also serve as a Lyapunov function so long as $g(\bx,\bu^*(\bx)) \succeq 0$.

# Lyapunov analysis with convex optimization

One of the primary limitations in Lyapunov analysis as I have presented it so far is that it is potentially very difficult to come up with suitable Lyapunov function candidates for interesting systems, especially for underactuated systems. ("Underactuated" is almost synonymous with "interesting" in my vocabulary.) Even if somebody were to give me a Lyapunov candidate for a general nonlinear system, the Lyapunov conditions can be difficult to check -- for instance, how would I check that $\dot{V}$ is strictly negative for all $\bx$ except the origin if $\dot{V}$ is some arbitrarily complicated nonlinear function over a vector $\bx$?

In this section, we'll look at some computational approaches to verifying the Lyapunov conditions, and even to searching for the description of the Lyapunov functions themselves.

If you're imagining numerical algorithms to check the Lyapunov conditions for complicated Lyapunov functions and complicated dynamics, the first thought is probably that we can evaluate $V$ and $\dot{V}$ at a large number of sample points and check whether $V$ is positive and $\dot{V}$ is negative. This does work, and could potentially be combined with some smoothness or regularity assumptions to generalize beyond the sample points. Add python bindings for the pendulum energy lp lyapunov example. But in many cases we will be able to do better -- providing optimization algorithms that will rigorously check these conditions for all $\bx$ without dense sampling; these will also give us additional leverage in formulating the search for Lyapunov functions.

# Lyapunov analysis for linear systems

Let's take a moment to see how things play out for linear systems.

# Lyapunov analysis for stable linear systems

Imagine you have a linear system, $\dot\bx = {\bf A}\bx$, and can find a Lyapunov function $$V(\bx) = \bx^T {\bf P} \bx, \quad {\bf P} = {\bf P^T} \succ 0,$$ which also satisfies $$\dot{V}(\bx) = \bx^T {\bf PA} \bx + \bx^T {\bf A}^T {\bf P}\bx \prec 0.$$ Then the origin is globally asymptotically stable.

Note that the radially-unbounded condition is satisfied by ${\bf P} \succ 0$, and that the derivative condition is equivalent to the matrix condition $${\bf PA} + {\bf A}^T {\bf P} \prec 0.$$

For stable linear systems the existence of a quadratic Lyapunov function is actually a necessary (as well as sufficient) condition. Furthermore, a Lyapunov function can always be found by finding the positive-definite solution to the matrix Lyapunov equation $${\bf PA} + {\bf A}^T{\bf P} = - {\bf Q},\label{eq:algebraic_lyapunov}$$ for any ${\bf Q}={\bf Q}^T\succ 0$.

add an example here. double integrator? re-analyze the LQR output?

This is a very powerful result - for nonlinear systems it will be potentially difficult to find a Lyapunov function, but for linear systems it is straight-forward. In fact, this result is often used to propose candidates for non-linear systems, e.g., by linearizing the equations and solving a local linear Lyapunov function which should be valid in the vicinity of a fixed point.

# Lyapunov analysis as a semi-definite program (SDP)

Lyapunov analysis for linear systems has an extremely important connection to convex optimization. In particular, we could have also formulated the Lyapunov conditions for linear systems above using semi-definite programming (SDP). Semidefinite programming is a subset of convex optimization -- an extremely important class of problems for which we can produce efficient algorithms that are guaranteed find the global optima solution (up to a numerical tolerance and barring any numerical difficulties).

If you don't know much about convex optimization or want a quick refresher, please take a few minutes to read the optimization preliminaries in the appendix. The main requirement for this section is to appreciate that it is possible to formulate efficient optimization problems where the constraints include specifying that one or more matrices are positive semi-definite (PSD). These matrices must be formed from a linear combination of the decision variables. For a trivial example, the optimization $$\min_a a,\quad \subjto \begin{bmatrix} a & 0 \\ 0 & 1 \end{bmatrix} \succeq 0,$$ returns $a = 0$ (up to numerical tolerances).

The value in this is immediate for linear systems. For example, we can formulate the search for a Lyapunov function for the linear system $\dot\bx = {\bf A} \bx$ by using the parameters ${\bf p}$ to populate a symmetric matrix ${\bf P}$ and then write the SDP: $\find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad {\bf PA} + {\bf A}^T {\bf P} \preceq 0.$ Note that you would probably never use that particular formulation, since there specialized algorithms for solving the simple Lyapunov equation which are more efficient and more numerically stable. But the SDP formulation does provide something new -- we can now easily formulate the search for a "common Lyapunov function" for uncertain linear systems.

# Common Lyapunov analysis for linear systems

Suppose you have a system governed by the equations $\dot\bx = {\bf A}\bx$, where the matrix ${\bf A}$ is unknown, but its uncertain elements can be bounded. There are a number of ways to write down this uncertainty set; let us choose to write this by describing ${\bf A}$ as the convex combination of a number of known matrices, $${\bf A} = \sum_{i} \beta_i {\bf A}_i, \quad \sum_i \beta_i = 1, \quad \forall i, \beta_i > 0.$$ This is just one way to specify the uncertainty; geometrically it is describing a polygon of uncertain parameters (in the space of elements of ${\bf A}$ with each ${\bf A}_i$ as one of the vertices in the polygon.

Now we can formulate the search for a common Lyapunov function using $\find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad \forall_i, {\bf PA}_i + {\bf A}_i^T {\bf P} \preceq 0.$ The solver will then return a matrix ${\bf P}$ which satisfies all of the constraints, or return saying "problem is infeasible". It can easily be verified that if ${\bf P}$ satisfies the Lyapunov condition at all of the vertices, then it satisfies the condition for every ${\bf A}$ in the set: ${\bf P}(\sum_i \beta_i {\bf A}_i) + (\sum_i \beta_i {\bf A}_i)^T {\bf P} = \sum_i \beta_i ({\bf P A}_i + {\bf A}_i^T {\bf P}) \preceq 0,$ since $\forall i$, $\beta_i > 0$. Note that, unlike the simple Lyapunov equation for a known linear system, this condition being satisfied is a sufficient but not a necessary condition -- it is possible that the set of uncertain matrices ${\bf A}$ is robustly stable, but that this stability cannot be demonstrated with a common quadratic Lyapunov function.

You can try this example for yourself in .

lyapunov/linear_systems_common_lyapunov.py

As always, make sure that you open up the code and take a look.

This example is very important because it establishes a connection between Lyapunov functions and (convex) optimization. But so far we've only demonstrated this connection for linear systems where the PSD matrices provide a magical recipe for establishing the positivity of the (quadratic) functions for all $\bx$. Is there any hope of extending this type of analysis to more general nonlinear systems? Surprisingly, it turns out that there is.

# Sums-of-squares optimization

It turns out that in the same way that we can use SDP to search over the positive quadratic equations, we can generalize this to search over the positive polynomial equations. To be clear, for quadratic equations we have ${\bf P}\succeq 0 \quad \Rightarrow \quad \bx^T {\bf P} \bx \ge 0, \quad \forall \bx.$ It turns out that we can generalize this to ${\bf P}\succeq 0 \quad \Rightarrow \quad {\bf m}^T(\bx) {\bf P} {\bf m}(\bx) \ge 0, \quad \forall \bx,$ where ${\bf m}(\bx)$ is a vector of polynomial equations, typically chosen as a vector of monomials (polynomials with only one term). The set of positive polynomials parameterized in this way is exactly the set of polynomials that can be written as a sum of squaresParrilo00. While it is known that not all positive polynomials can be written in this form, much is known about the gap (and papers have been written about trying to find polynomials which are uniformly positive but not sums of squares). For our purposes this gap is very small; we should remember that it exists but not worry about it too much for now.

Even better, there is quite a bit that is known about how to choose the terms in ${\bf m}(\bx)$. For example, if you give me the polynomial $p(x) = 2 - 4x + 5x^2$ and ask if it is positive for all real $x$, I can convince you that it is by producing the sums-of-squares factorization $p(x) = 1 + (1-2x)^2 + x^2,$ and I know that I can formulate the problem without needing any monomials of degree greater than 1 (the square-root of the degree of $p$) in my monomial vector. In practice, what this means for us is that people have authored optimization front-ends which take a high-level specification with constraints on the positivity of polynomials and they automatically generate the SDP problem for you, without having to think about what terms should appear in ${\bf m}(\bx)$ (c.f. Prajna04). This class of optimization problems is called Sums-of-Squares (SOS) optimization.

We will write optimizations using sums-of-squares constraints as $p(\bx) \sos$ as shorthand for the constraint that $p(\bx) \ge 0$ for all $\bx$, as demonstrated by finding a sums-of-squares decomposition.

# Lyapunov analysis for polynomial systems

Now we can see that it may be possible to generalize the optimization approach using SDP to search for Lyapunov functions for linear systems to searching for Lyapunov functions for at least the polynomial systems: $\dot\bx = f(\bx),$ where $f$ is a vector-valued polynomial function. If we parameterize a fixed-degree Lyapunov candidate as a polynomial with unknown coefficients, e.g., $V_\alpha(\bx) = \alpha_0 + \alpha_1 x_1 + \alpha_2 x_2 + \alpha_3 x_1x_2 + \alpha_4 x_1^2 + ...,$ then the time-derivative of $V$ is also a polynomial, and I can formulate the optimization: \begin{align*} \find_\alpha, \quad \subjto \quad& V_\alpha(\bx) \sos \\ & -\dot{V}_\alpha(\bx) = -\pd{V_\alpha}{\bx} f(\bx) \sos. \end{align*} Because this is a convex optimization, the solver will return a solution if one exists.

# Global stability of the simple pendulum via SOS

We opened this chapter using our intuition about energy to discuss stability on the simple pendulum. Now we'll replace that intuition with convex optimization (because it will also work for more difficult systems where our intuition fails).

But wait! The equations of motion of the simple pendulum are not polynomial, are they (they have a $\sin(\theta)$ in them)? Actually nearly all rigid-body systems can be written as polynomials given a change of coordinates. Indeed, the rigid-body-assumption is simply an assumption that points on the same body stay a constant distance apart, and Euclidean distance is certainly a polynomial. For a more thorough discussion see, for instance, Wampler11 and Sommese05.

For the purposes of this example, let's change coordinates from $[\theta,\dot\theta]^T$ to $\bx = [s,c,\dot\theta]^t$, where $s \equiv \sin\theta$ and $c \equiv \cos\theta$. Then we can write the pendulum dynamics as $$\dot\bx = \begin{bmatrix} c \dot\theta \\ -s \dot\theta \\ -\frac{1}{m l^2} \left( b \dot\theta + mgls \right) \end{bmatrix}.$$

Now let's parameterize a Lyapunov candidate $V(s,c,\dot\theta)$ as the polynomial with unknown coefficients which contains all monomials up to degree 2: $$V = \alpha_0 + \alpha_1 s + \alpha_2 c + ... \alpha_{9} s^2 + \alpha_{10} sc + \alpha_{11} s\dot\theta.$$ Now we'll formulate the feasibility problem: $\find_{\bf \alpha} \quad \subjto \quad V \sos, \quad -\dot{V} \sos.$ In fact, this is asking too much -- really $\dot{V}$ only needs to be negative when $s^2+c^2=1$. We can accomplish this with the so-called S-procedure, which we will discuss more below, and instead write $\find_{{\bf \alpha},\lambda} \quad \subjto \quad V \sos, \quad -\dot{V} -\lambda(\bx)(s^2+c^2-1) \sos.$ (Here $\lambda(\bx)$ is another polynomial with free coefficients which the optimization can use to make terms arbitrarily more positive when $s^2+c^2 \neq 1$.) Finally, for style points, in the code example in we ask for exponential stability:

Python version coming soon (try src/lypunov/cubic_polyonomial.py if your eager).

As always, make sure that you open up the code and take a look. The result is a Lyapunov function that looks familiar (visualized as a countour plot here):

Aha! Not only does the optimization finds us coefficients for the Lyapunov function which satisfy our Lyapunov conditions, but the result looks a lot like mechanical energy. In fact, the result is a little better than energy... there are some small extra terms added which prove exponential stability without having to invoke LaSalle's Theorem.

It is important to remember that there are a handful of gaps which make the existence of this solution a sufficient condition (for proving that every sub-level set of $V$ is an invariant set of $f$) instead of a necessary one. First, there is no guarantee that a stable polynomial system can be verified using a polynomial Lyapunov function (of any degree, and in fact there are known counter-examples add ref) and here we are only searching over a fixed-degree polynomial. Second, even if a polynomial Lyapunov function does exist, there is a gap between the SOS polynomials and the positive polynomials.

Despite these caveats, I have found this formulation to be surprisingly effective in practice. Intuitively, I think that this is because there is relatively a lot of flexibility in the Lyapunov conditions -- if you can find one function which is a Lyapunov function for the system, then there are also many "nearby" functions which will satisfy the same constraints.

# Lyapunov functions for estimating regions of attraction

There is another very important connection between Lyapunov functions and the concept of an invariant set: any sub-level set of a Lyapunov function is also an invariant set. This gives us the ability to use sub-level sets of a Lyapunov function as approximations of the region of attraction for nonlinear systems.

# Lyapunov invariant set and region of attraction theorem

Given a system $\dot{\bx} = f(\bx)$ with $f$ continuous, if we can find a scalar function $V(\bx) \succ 0$ and a sub-level set $${\cal G}: \{ \bx | V(\bx) < \rho \}$$ on which $$\forall \bx \in {\cal G}, \dot{V}(\bx) \preceq 0,$$ then ${\cal G}$ is an invariant set. By LaSalle, $\bx$ will converge to the largest invariant subset of ${\cal G}$ on which $\dot{V}=0$.

Furthermore, if $\dot{V}(\bx) \prec 0$ in ${\cal G}$, then the origin is locally asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point. Alternatively, if $\dot{V}(\bx) \preceq 0$ in ${\cal G}$ and $\bx = 0$ is the only invariant subset of ${\cal G}$ where $\dot{V}=0$, then the origin is asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point.

# Region of attraction for a one-dimensional system

Consider the first-order, one-dimensional system $\dot{x} = -x + x^3.$ We can quickly understand this system using our tools for graphical analysis.

In the vicinity of the origin, $\dot{x}$ looks like $-x$, and as we move away it looks increasingly like $x^3$. There is a stable fixed point at the origin and unstable fixed points at $\pm 1$. In fact, we can deduce visually that the region of attraction to the stable fixed point at the origin is $\bx \in (-1,1)$. Let's see if we can demonstrate this with a Lyapunov argument.

First, let us linearize the dynamics about the origin and use the Lyapunov equation for linear systems to find a candidate $V(\bx)$. Since the linearization is $\dot{x} = -x$, if we take ${\bf Q}=1$, then we find ${\bf P}=\frac{1}{2}$ is the positive definite solution to the algebraic Lyapunov equation (\ref{eq:algebraic_lyapunov}). Proceeding with $$V(\bx) = \frac{1}{2} x^2,$$ we have $$\dot{V} = x (-x + x^3) = -x^2 + x^4.$$ This function is zero at the origin, negative for $|x| < 1$, and positive for $|x| > 1$. Therefore we can conclude that the sub-level set $V < \frac{1}{2}$ is invariant and the set $x \in (-1,1)$ is inside the region of attraction of the nonlinear system. In fact, this estimate is tight.

# Robustness analysis using "common Lyapunov functions"

While we will defer most discussions on robustness analysis until later in the notes, there is a simple and powerful idea that can be presented now: the idea of a common Lyapunov function. Imagine that you have a model of a dynamical system but that you are uncertain about some of the parameters. For example, you have a model of a quadrotor, and are fairly confident about the mass and lengths (both of which are easy to measure), but are not confident about the moment of inertia. One approach to robustness analysis is to define a bounded uncertainty, which could take the form of $$\dot{\bx} = f_\alpha(\bx), \quad \alpha_{min} \le \alpha \le \alpha_{max},$$ with $\alpha$ a vector of uncertain parameters in your model. Richer specifications of the uncertainty bounds are also possible, but this will serve for our examples.

In standard Lyapunov analysis, we are searching for a function that goes downhill for all $\bx$ to make statements about the long-term dynamics of the system. In common Lyapunov analysis, we can make many similar statements about the long-term dynamics of an uncertain system if we can find a single Lyapunov function that goes downhill for all possible values of $\alpha$. If we can find such a function, then we can use it to make statements with all of the variations we've discussed (local, regional, or global; in the sense of Lyapunov, asymptotic, or exponential).

# A one-dimensional system with gain uncertainty

Let's consider the same one-dimensional example used above, but add an uncertain parameter into the dynamics. In particular, consider the system: $$\dot{x} = -x + \alpha x^3, \quad \frac{3}{4} < \alpha < \frac{3}{2}.$$ Plotting the graph of the one-dimensional dynamics for a few values of $\alpha$, we can see that the fixed point at the origin is still stable, but the robust region of attraction to this fixed point (shaded in blue below) is smaller than the region of attraction for the system with $\alpha=1$.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + \alpha x^4.$$ This function is zero at the origin, and negative for all $\alpha$ whenever $x^2 > \alpha x^4$, or $$|x| < \frac{1}{\sqrt{\alpha_{max}}} = \sqrt{\frac{2}{3}}.$$ Therefore, we can conclude that $|x| < \sqrt{\frac{2}{3}}$ is inside the robust region of attraction of the uncertain system.

Not all forms of uncertainty are as simple to deal with as the gain uncertainty in that example. For many forms of uncertainty, we might not even know the location of the fixed points of the uncertain system. In this case, we can often still use common Lyapunov functions to give some guarantees about the system, such as guarantees of robust set invariance. For instance, if you have uncertain parameters on a quadrotor model, you might be ok with the quadrotor stabilizing to a pitch of $0.01$ radians, but you would like to guarantee that it definitely does not flip over and crash into the ground.

# A one-dimensional system with additive uncertainty

Now consider the system: $$\dot{x} = -x + x^3 + \alpha, \quad -\frac{1}{4} < \alpha < \frac{1}{4}.$$ Plotting the graph of the one-dimensional dynamics for a few values of $\alpha$, this time we can see that the fixed point is not necessarily at the origin; the location of the fixed point moves depending on the value of $\alpha$. But we should be able to guarantee that the uncertain system will stay near the origin if it starts near the origin, using an invariant set argument.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + x^4 + \alpha x.$$ This Lyapunov function allows us to easily verify, for instance, that $V \le \frac{1}{3}$ is a robust invariant set, because whenever $V = \frac{1}{3}$, we have $$\forall \alpha \in [\alpha_{min},\alpha_{max}],\quad \dot{V}(x,\alpha) < 0.$$ Therefore $V$ can never start at less than one-third and cross over to greater than one-third. To see this, see that $$V=\frac{1}{3} \quad \Rightarrow \quad x = \pm \sqrt{\frac{2}{3}} \quad \Rightarrow \quad \dot{V} = -\frac{2}{9} \pm \alpha \sqrt{\frac{2}{3}} < 0, \forall \alpha \in \left[-\frac{1}{4},\frac{1}{4} \right].$$ Note that not all sub-level sets of this invariant set are invariant. For instance $V < \frac{1}{32}$ does not satisfy this condition, and by visual inspection we can see that it is in fact not robustly invariant.

# Region of attraction estimation for polynomial systems

Now we have arrived at the tool that I believe can be a work-horse for many serious robotics applications. Most of our robots are not actually globally stable (that's not because they are robots -- if you push me hard enough, I will fall down, too), which means that understanding the regions where a particular controller can be guaranteed to work can be of critical importance.

Sums-of-squares optimization effectively gives us an oracle which we can ask: is this polynomial positive for all $\bx$? To use this for regional analysis, we have to figure out how to modify our questions to the oracle so that the oracle will say "yes" or "no" when we ask if a function is positive over a certain region which is a subset of $\Re^n$. That trick is called the S-procedure. It is closely related to the Lagrange multipliers from constrained optimization, and has deep connections to "Positivstellensatz" from algebraic geometry.

# The S-procedure

Consider a scalar polynomial, $p(\bx)$, and a semi-algebraic set $g(\bx) \preceq 0$, where $g$ is a vector of polynomials. If I can find a polynomial "multiplier", $\lambda(\bx)$, such that $p(\bx) + \lambda^T(\bx) g(\bx) \sos, \quad \text{and} \quad \lambda(\bx) \sos,$ then this is sufficient to demonstrate that $$p(\bx)\ge 0 \quad \forall \bx \in \{ g(\bx) \le 0 \}.$$ To convince yourself, observe that when $g(\bx) \le 0$, it is only harder to be positive, but when $g(\bx) > 0$, it is possible for the combined function to be SOS even if $p(\bx)$ is negative.

# Region of attraction for the one-dimensional cubic system

Let's return to our example from above: $\dot{x} = -x + x^3$ and try to use SOS optimization to demonstrate that the region of attraction of the fixed point at the origin is $x \in (-1,1)$, using the Lyapunov candidate $V = x^2.$

First, define the multiplier polynomial, $\lambda(x) = c_0 + c_1 x + c_2 x^2 + c_3 x^3 + c_4 x^4.$ Then define the optimization \begin{align*} \find_{\bf c} \quad & \\ \subjto \quad& - \dot{V}(x) - \lambda(x) (1-V(x)) \sos \\ & \lambda(x) \sos \end{align*}

You can try this example for yourself in .

lyapunov/cubic_polynomial.py

While the example above only verifies that the one-sub-level set of the pre-specified Lyapunov candidate is negative (certifying the ROA that we already understood), we can generalize the optimization to allow us to search for the largest sub-level set (with the objective using a convex approximation for volume). We can even search for the coefficients of the Lyapunov function using an iteration of convex optimizations. There are a number of variations and nuances in the various formulations, and some basic rescaling tricks that can help make the numerics of the problem better for the solvers.

make this true for pydrake: In , we have packaged all of these up into a regionOfAttraction method that is defined for objects of type PolynomialSystem. This makes it as simple as, for instance:

# Searching for the region of attraction


cd(fullfile(getDrakePath,'examples'));

% create a new CubicPolynomialExample object
p = CubicPolynomialExample();

% compute region of attraction
% the levelset V<1 is the region of attraction
V=regionOfAttraction(p,Point(p.getStateFrame,0));

% display the polynomial representation of V that results
display(V.getPoly);

which yields the console output:

[  (1)*x1^2  ]


The algorithm successfully found that $x \in (-1,1)$ is the region of attraction.

Remember that although we have tried to make it convenient to call these functions, they are not a black box. I highly recommend opening up the regionOfAttraction method and understanding how it works.

# Region of attraction for the time-reversed Van der Pol oscillator


cd(fullfile(getDrakePath,'examples'));
VanDerPol.run();


# Seeding the optimization with linear analysis

LQR gives the cost-to-go which can be used as the Lyapunov candidate. Otherwise, use a Lyapunov equation. You may not even need to search for a better Lyapunov function, but rather just ask the question: what is the largest region of attraction that can be demonstrated for the nonlinear system using the Lyapunov function from linear analysis?

# Rigid-body dynamics are polynomial

Coming soon... For a nice explanation of how rigid-body kinematics are polynomial, see Wampler11. Looking at the Lagrangian, you can see that if the kinematics are polynomial, then the dynamics will be, too.

In practice, you can also Taylor approximate any smooth nonlinear function using polynomials. This can be an effective strategy in practice, because you can limit the degrees of the polynomial, and because in many cases it is possible to provide conservative bounds on the errors due to the approximation.

implicit dynamics formulations. convex ROA w/ outer approx. control design.

# Trajectory Optimization

I've argued that optimal control is a powerful framework for specifying complex behaviors with simple objective functions, letting the dynamics and constraints on the system shape the resulting feedback controller (and vice versa!). But the computational tools that we've provided so far have been limited in some important ways. The numerical approaches to dynamic programming which involve putting a mesh over the state space do not scale well to systems with state dimension more than four or five. Linearization around a nominal operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g. using LQR) for even very high-dimensional systems, but the effectiveness of the resulting controllers is limited to the region of state space where the linearization is a good approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis from the last chapter can provide, among other things, an effective way to compute estimates of those regions. But we have not yet provided any real computational tools for approximate optimal control that work for high-dimensional systems beyond the linearization around a goal. That is precisely the goal for this chapter.

The big change that is going to allow us to scale to high-dimensional systems is that we are going to give up the goal of solving for the optimal feedback controller for the entire state space, and instead attempt to find an optimal control solution that is valid from only a single initial condition. Instead of representing this as a feedback control function, we can represent this solution as a trajectory, $\bx(\cdot), \bu(\cdot)$, typically defined over a finite interval. In our graph-search dynamic programming algorithms, we discretized the dynamics of the system on a mesh spread across the state space. This does not scale to high-dimensional systems, and it was difficult to bound the errors due to the discretization. If we instead restrict ourselves to optimizing only a single initial condition, then a different discretization scheme emerges: we can discretize the state and input trajectories over time.

# Problem Formulation

Given an initial condition, $\bx_0$, and an input trajectory $u(t)$ defined over a finite interval, $t\in[t_0,t_f]$, we can compute the long-term (finite-horizon) cost of executing that trajectory using the standard additive-cost optimal control objective, $J_{\bu(\cdot)}(\bx_0) = \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt.$ We will write the trajectory optimization problem as \begin{align*} \minimize_{u(\cdot)} \quad & \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt \\ \subjto \quad & \forall t, \dot{\bx}(t) = f(\bx(t),\bu(t)), \\ & \bx(t_0) = \bx_0. \\ \end{align*} Some trajectory optimization problems may also include additional constraints, such as collision avoidance ($\bx$ can not cause the robot to be inside an obstacle) or input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ), which can be defined for all time or some subset of the trajectory.

As written, the optimization above is an optimization over continuous trajectories. In order to formulate this as a numerical optimization, we must parameterize it with a finite set of numbers. Perhaps not surprisingly, there are many different ways to write down this parameterization, with a variety of different properties in terms of speed, robustness, and accuracy of the results. We will outline just a few of the most popular below. I would recommend Betts98+Betts01 for additional details.

# Computational Tools for Nonlinear Optimization

Before we dive in, we need to take a moment to understand the optimization tools that we will be using. In the graph-search dynamic programming algorithm, we were magically able to provide an iterative algorithm that was known to converge to the optimal cost-to-go function. With LQR we were able to reduce the problem to a matrix Riccati equation, for which we have scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able to formulate a very specific kind of optimization problem--a semi-definite program (or SDP)--which is a subset of convex optimization, and relied on custom solvers like Gurobi or Mosek to solve the problems for us. Convex optimization is a hugely important subset of nonlinear optimization, in which we can guarantee that the optimization has no "local minima". In this chapter we won't be so lucky; most of the optimizations that we formulate may have local minima and the solution techniques will at best only guarantee that they give a locally optimal solution.

The generic formulation of a nonlinear optimization problem is $\minimize_z c(z) \quad \subjto \quad \bphi(z) \le 0,$ where $z$ is a vector of decision variables, $c$ is a scalar objective function and $\phi$ is a vector of constraints. Note that, although we write $\phi \le 0$, this formulation captures positivity constraints on the decision variables (simply multiply the constraint by $-1$) and equality constraints (simply list both $\phi\le0$ and $-\phi\le0$) as well.

The picture that you should have in your head is a nonlinear, potentially non-convex objective function defined over (multi-dimensional) $z$, with a subset of possible $z$ values satisfying the constraints.

Note that minima can be the result of the objective function having zero derivative or due to a sloped objective up against a constraint.

Numerical methods for solving these optimization problems require an initial guess, $\hat{z}$, and proceed by trying to move down the objective function to a minima. Common approaches include gradient descent, in which the gradient of the objective function is computed or estimated, or second-order methods such as sequential quadratic programming (SQP) which attempts to make a local quadratic approximation of the objective function and local linear approximations of the constraints and solves a quadratic program on each iteration to jump directly to the minimum of the local approximation.

While not strictly required, these algorithms can often benefit a great deal from having the gradients of the objective and constraints computed explicitly; the alternative is to obtain them from numerical differentiation. Beyond pure speed considerations, I strongly prefer to compute the gradients explicitly because it can help avoid numerical accuracy issues that can creep in with finite difference methods. The desire to calculate these gradients will be a major theme in the discussion below, and we have gone to great lengths to provide explicit gradients of our provided functions and automatic differentiation of user-provided functions in .

When I started out, I was of the opinion that there is nothing difficult about implementing gradient descent or even a second-order method, and I wrote all of the solvers myself. I now realize that I was wrong. The commercial solvers available for nonlinear programming are substantially higher performance than anything I wrote myself, with a number of tricks, subtleties, and parameter choices that can make a huge difference in practice. Some of these solvers can exploit sparsity in the problem (e.g., if the constraints operate in a sparse way on the decision variables). Nowadays, we make heaviest use of SNOPT Gill06, which now comes bundled with the binary distributions of , but also support a large suite of numerical solvers. Note that while I do advocate using these tools, you do not need to use them as a black box. In many cases you can improve the optimization performance by understanding and selecting non-default configuration parameters.

# Direct Transcription

Let us start by representing the finite-time trajectories, $\bx(t)$ and $\bu(t)$ $\forall t\in[t_0,t_f]$, by their values at a series of break points, $t_0,t_1,t_2,t_3$, and denoting the values at those points (aka the knot points) $\bx_0,...,\bx_N$ and $\bu_0,...,\bu_N$, respectively.

Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinear program is to fix the break points at even intervals, $dt$, and use Euler integration to write \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx_n,\bu_n)dt \\ & \bx_{n+1} = \bx_n + f(\bx_n,\bu_n)dt, \quad \forall n\in [0,N-1]. \end{align*} Note that the decision variables here are $(\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1})$, because $\bx_0$ is given, and $\bu_{N}$ does not appear in the cost nor any of the constraints. It is easy to generalize this approach to add additional costs or constraints on $\bx$ and/or $\bu$. (Note that this formulation does not actually benefit from the additive cost structure, so more general cost formulations are also possible.) Computing the gradients of the objective and constraints is essentially as simple as computing the gradients of $g()$ and $f()$.

If you take a moment to think about what the direct transcription algorithm is doing, you will see that by satisfying the dynamic constraints, the optimization is effectively solving the (Euler approximation of the) differential equation. But instead of marching forward through time, it is minimizing the inconsistency at each of the knot points simultaneously. While it's easy enough to generalize the constraints to use higher-order integration schemes, paying attention to the trade-off between the number of times the constraint must be evaluated vs the density of the knot points in time, if the goal is to obtain smooth $\bx$ trajectory solutions then another approach quickly emerges: the approach taken by the so-called collocation methods.

# Direct Collocation

In direct collocation (c.f., Hargraves87), both the input trajectory and the state trajectory are represented explicitly as piecewise polynomial functions. In particular, the sweet spot for this algorithm is taking $\bu(t)$ to be a first-order polynomial -- allowing it to be completely defined by the values at the knot points -- and $\bx(t)$ to be a Hermite cubic polynomial -- completely defined by the values and derivatives at the knot points. The state derivatives at the knot points are given as a function of $\bx$ and $\bu$ by the plant dynamics, so the entire spline is completely defined by the values of $\bx$ and $\bu$ at the knot points. To add the additional constraint that $\bx_{n+1}$ is dynamically consistent with $\bx_{n}$ and $\bu_{n}$, we add an additional set of constraints that the derivative at a set of collocation points also matches the plant dynamics. For the special case of cubic polynomial state trajectories and piecewise linear input trajectories, the derivative at the midpoint of each segment is particularly easy to compute, making it the natural choice.

Once again, direct collocation effectively integrates the equations of motion by satisfying the constraints of the optimization -- this time producing a third-order approximation of the dynamics with effectively two evaluations of the plant dynamics per segment. Hargraves87 claims, without proof, that as the break points are brought closer together, the trajectory will converge to a true solution of the differential equation. Once again it is very natural to add additional terms to the cost function or additional input/state constraints, and very easy to calculate the gradients of the objective and constraints. I personally find it very nice to explicitly account for the parametric encoding of the trajectory in the solution technique.

# Direct Collocation for the Pendulum, Acrobot, and Cart-Pole

Direct collocation also easily solves the minimum-time problem for the double integrator. The performance is similar to direct transcription, but the convergence is visibly different. Try it for yourself:

pendulum/dircol_swingup.py acrobot/dircol_swingup.py cartpole/dircol_swingup.py As always, make sure you take a look at the code!

# Shooting Methods

In the methods described above, by asking the optimization package to perform the numerical integration of the equations of motion, we are effectively over-parameterizing the problem. In fact, the optimization is perfectly well defined if we restrict the decision variables to $\bu_0,...,\bu_{N-1}$ only--we can compute $\bx_1,...,\bx_{N}$ ourselves by knowing $\bx_0$, and $\bu(\cdot)$. This is exactly the approach taken in the shooting methods.

Again, providing gradients of the objectives and constraints to the solver is not strictly required -- most solvers will obtain them from finite differences if they are not provided -- but I feel strongly that the solvers are faster and more robust when exact gradients are provided. Now that we have removed the $\bx$ decision variables from the program, we have to take a little extra care to compute the gradients. This is still easily accomplished using the chain rule. To be concise (and slightly more general), let us define $\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the continuous dynamics; for example, the forward Euler integration scheme used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$ Then we have $\pd{J}{\bu_k} = \sum_{n=0}^{N-1} \left( \pd{g(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{g(\bx[n],\bu[n])}{\bu_k} \right),$ where the gradient of the state with respect to the inputs can be computed during the "forward simulation", $\pd{\bx[n+1]}{\bu_k} = \pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{f_d(\bx[n],\bu[n])}{\bu_k}.$ These simulation gradients can also be used in the chain rule to provide the gradients of any constraints. Note that there are a lot of terms to keep around here, on the order of (state dim) $\times$ (control dim) $\times$ (number of timesteps). Ouch. Note also that many of these terms are zero; for instance with the Euler integration scheme above $\pd{\bu[n]}{\bu_k} = 0$ if $k\ne n$. (If this looks like I'm mixing two notations here, recall that I'm using $\bu_k$ to represent the decision variable and $\bu[n]$ to represent the input used in the $n$th step of the simulation.)

# The special case of optimization without state constraints

By solving for $\bx(\cdot)$ ourselves, we've removed a large number of constraints from the optimization. If no additional state constraints are present, and the only gradients we need to compute are the gradients of the objective, then a surprisingly efficient algorithm emerges. I'll give the steps here without derivation, but will derive it in the Pontryagin section below:

1. Simulate forward: $$\bx[n+1] = f_d(\bx[n],\bu_n),$$ from $\bx[0] = \bx_0$
2. Calculate backwards: $$\lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx[n]}^T + \pd{f(\bx[n],\bu[n])}{\bx[n]}^T \lambda[n],$$ from $\lambda[N-1]=0$
3. Extract the gradients: $$\pd{J}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu[n]} + \lambda[n]^T \pd{f(\bx[n],\bu[n])}{\bu[n]},$$ with $\pd{J}{\bu_k} = \sum_n \pd{J}{\bu[n]}\pd{\bu[n]}{\bu_k}$.

Here $\lambda[n]$ is a vector the same size as $\bx[n]$ which has an interpretation as $\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is known as the adjoint equation, and it represents a dramatic efficient improvement over calculating the huge number of simulation gradients described above. In case you are interested, the adjoint equation is known as the backpropagation algorithm in the neural networks literature and it is one of the primary reasons that training neural networks became so popular in the 1980's; super fast GPU implementations of this algorithm are also one of the reasons that deep learning is incredibly popular right now (the availability of massive training databases is perhaps the other main reason).

To take advantage of this efficiency, advocates of the shooting methods often use penalty methods instead of enforcing hard state constraints; instead of telling the solver about the constraint explicitly, you simply add an additional term to the cost function which gives a large penalty commensurate with the amount by which the constraint is violated. These are not quite as accurate and can be harder to tune (you'd like the cost to be high compared to other costs, but making it too high can lead to numerical conditioning issues), but they can work.

# Discussion

Although the decision about which algorithm is best may depend on the situation, in our work we have come to favor the direct collocation method (and occasionally direct transcription) for most of our work. There are a number of arguments for and against each approach; I will try to discuss a few of them here.

# Solver performance

Numerical conditioning. Tail wagging the dog.

Sparse constraints. Potential for parallel evaluation of the constraints (computing the dynamics and their derivatives are often the most expensive part).

# Providing an initial guess

to avoid local minima. direct transcription and collocation can take an initial guess in $\bx$, too.

# Implicit dynamics

Another potential advantage of the direct transcription and collocation methods is that the dynamics constraints can be written in implicit form.

# Variations in the problem formulation

There are number of useful variations to the problem formulations I've presented above. By far the most common is the addition of a terminal cost, e.g.: $$\min \quad h(\bx[N]) + \sum_{n=0}^{N-1} g(\bx[n],\bu[n]).$$ These terms are easily added to the cost function in the any of methods, and the adjoint equations of the shooting method simply require the a modified terminal condition $$\lambda[N-1] = \pd{h(\bx[N])}{\bx[N]}$$ when computing the gradients.

Another common modification is including the spacing of the breakpoints as additional decision variables. This is particularly easy in the direct transcription and collocation methods, and can also be worked into the shooting methods. Typically we add a lower-bound on the time-step so that they don't all vanish to zero.

# Accuracy of numerical integration

One potential complaint about the direct transcription and collocation algorithms is that we tend to use simplistic numerical integration methods and often fix the integration timestep (e.g. by choosing Euler integration and selecting a $dt$); it is difficult to bound the resulting integration errors in the solution. One tantalizing possibility in the shooting methods is that the forward integration could be accomplished by more sophisticated methods, like variable-step integration. But I must say that I have not had much success with this approach, because while the numerical accuracy of any one function evaluation might be improved, these integration schemes do not necessarily give smooth outputs as you make incremental changes to the initial conditions and control (changing $\bu_2$ by $\epsilon$ could result in taking a different number of steps in the integration scheme). This lack of smoothness can wreak havoc on the convergence of the optimization. If numerical accuracy is a premium, then I think you will have more success by imposing consistency constraints (e.g. as in the Runge-Kutta 4th order simulation with 5th order error checking method) as additional constraints on the time-steps; shooting methods do not have any particular advantage here.

reference DMOC

# Pontryagin's Minimum Principle

The tools that we've been developing for numerical trajectory optimization are closely tied to theorems from (analytical) optimal control. Let us take one section to appreciate those connections.

What precisely does it mean for a trajectory, $\bx(\cdot),\bu(\cdot)$, to be locally optimal? It means that if I were to perturb that trajectory in any way (e.g. change $\bu_3$ by $\epsilon$), then I would either incur higher cost in my objective function or violate a constraint. For an unconstrained optimization, a necessary condition for local optimality is that the gradient of the objective at the solution be exactly zero. Of course the gradient can also vanish at local maxima or saddle points, but it certainly must vanish at local minima. We can generalize this argument to constrained optimization using Lagrange multipliers.

# Constrained optimization with Lagrange multipliers

Given the equality-constrained optimization problem $$\minimize_\bz g(\bz) \quad \subjto \quad \bphi(\bz) = 0,$$ where $\bphi$ is a vector. Define a vector $\lambda$ of Lagrange multipliers, the same size as $\phi$, and the scalar Lagrangian function, $$L(\bz,{\bf \lambda}) = g(\bz) + \lambda^T\phi(\bz).$$ A necessary condition for $\bz^*$ to be an optimal value of the constrained optimization is that the gradients of $L$ vanish with respect to both $\bz$ and $\lambda$: $$\pd{L}{\bz} = 0, \quad \pd{L}{\lambda} = 0.$$ Note that $\pd{L}{\lambda} = \phi(\bz)$, so requiring this to be zero is equivalent to requiring the constraints to be satisfied.

# Optimization on the unit circle

Consider the following optimization: $$\min_{x,y} x+y, \quad \subjto \quad x^2 + y^2 = 1.$$ The level sets of $x+y$ are straight lines with slope $-1$, and the constraint requires that the solution lives on the unit circle.

Simply by inspection, we can determine that the optimal solution should be $x=y=-\frac{\sqrt{2}}{2}.$ Let's make sure we can obtain the same result using Lagrange multipliers.

Formulating $$L = x+y+\lambda(x^2+y^2-1),$$ we can take the derivatives and solve \begin{align*} \pd{L}{x} = 1 + 2\lambda x = 0 \quad & \Rightarrow & \lambda = -\frac{1}{2x}, \\ \pd{L}{y} = 1 + 2\lambda y = 1 - \frac{y}{x} = 0 \quad & \Rightarrow & y = x,\\ \pd{L}{\lambda} = x^2+y^2-1 = 2x^2 -1 = 0 \quad & \Rightarrow & x = \pm \frac{1}{\sqrt{2}}. \end{align*} Given the two solutions which satisfy the necessary conditions, the negative solution is clearly the minimizer of the objective.

# Lagrange multiplier derivation of the adjoint equations

Let us use Lagrange multipliers to derive the necessary conditions for our constrained trajectory optimization problem in discrete time \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx[n],\bu[n]),\\ \subjto \quad & \bx[n+1] = f_d(\bx[n],\bu[n]). \end{align*} Formulate the Lagrangian, $L(\bx[\cdot],\bu[\cdot],\lambda[\cdot]) = \sum_{n=0}^{N-1} g(\bx[n],\bu[n]) + \sum_{n=0}^{N-1} \lambda^T[n] \left(f_d(\bx[n],\bu[n]) - \bx[n+1]\right),$ and set the derivatives to zero to obtain the adjoint equation method described for the shooting algorithm above: \begin{gather*} \forall n\in[0,N-1], \pd{L}{\lambda[n]} = f_d(\bx[n],\bu[n]) - \bx[n+1] = 0 \Rightarrow \bx[n+1] = f(\bx[n],\bu[n]) \\ \forall n\in[0,N-1], \pd{L}{\bx[n]} = \pd{g(\bx[n],\bu[n])}{\bx} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bx} - \lambda^T[n-1] = 0 \\ \quad \Rightarrow \lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx}^T + \pd{f_d(\bx[n],\bu[n])}{\bx}^T \lambda[n]. \\ \pd{L}{\bx[N]} = -\lambda[N-1] = 0 \Rightarrow \lambda[N-1] = 0 \\ \forall n\in[0,N-1], \pd{L}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bu} = 0. \end{gather*} Therefore, if we are given an initial condition $\bx_0$ and an input trajectory $\bu[\cdot]$, we can verify that it satisfies the necessary conditions for optimality by simulating the system forward in time to solve for $\bx[\cdot]$, solving the adjoint equation backwards in time to solve for $\lambda[\cdot]$, and verifying that $\pd{L}{\bu[n]} = 0$ for all $n$. The fact that $\pd{J}{\bu} = \pd{L}{\bu}$ when $\pd{L}{\bx} = 0$ and $\pd{L}{\lambda} = 0$ follows from some basic results in the calculus of variations.

# Necessary conditions for optimality in continuous time

You won't be surprised to hear that these necessary conditions have an analogue in continuous time. I'll state it here without derivation. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $g(\bx,\bu)$, for a trajectory $\bx(\cdot),\bu(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal it must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx} = f(\bx,\bu), \quad &\bx(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{g}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = 0\\ \forall t\in[t_0,t_f],\quad & \pd{g}{\bu} + \lambda^T\pd{f}{\bu} = 0.& \end{align*}

In fact the statement can be generalized even beyond this to the case where $\bu$ has constraints. The result is known as Pontryagin's minimum principle -- giving necessary conditions for a trajectory to be optimal.

P

# Pontryagin's Minimum Principle

Adapted from Bertsekas00a. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $g(\bx,\bu)$, for a trajectory $\bx^*(\cdot),\bu^*(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal is must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx}^* = f(\bx^*,\bu^*), \quad &\bx^*(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{g}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = 0\\ \forall t\in[t_0,t_f],\quad & u^* = \argmin_{\bu\in{\cal U}} \left[ g(\bx^*,\bu) + \lambda^T f(\bx^*,\bu) \right].& \end{align*}

Note that the terms which are minimized in the final line of the theorem are commonly referred to as the Hamiltonian of the optimal control problem, $$H(\bx,\bu,\lambda,t) = g(\bx,\bu) + \lambda^T f(\bx,\bu).$$ It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that $\lambda$ has an interpretation as $\pd{J}{\bx}^T$, you should also recognize it from the HJB.

# Linear systems with convex linear constraints

An important special case. Linear/Quadratic objectives results in an LP/QP $\Rightarrow$ Convex optimization.

# Local Trajectory Feedback Design

Once we have obtained a locally optimal trajectory from trajectory optimization, there is still work to do...

# Time-varying LQR

Take $\bar\bx(t) = \bx(t) - \bx_0(t)$, and $\bar\bu(t) = \bu(t)-\bu_0(t)$, then apply finite-horizon LQR (see the LQR chapter).

# Trajectory optimization as policy search.

Can also parameterize a feedback controller and search over those parameters directly (instead of parameterizing the input tape).

# Algorithms for Limit Cycles

The discussion of walking and running robots in Chapter 4 motivated the notion of limit cycle stability. Linear systems are not capable of producing stable limit cycle behavior, so this rich topic is unique to nonlinear systems design and analysis. Furthermore, the tools that are required to design, stabilize, and verify limit cycles will have applicability beyond simple periodic motions.

The first natural question we must ask is, given a system $\dot{\bx} = f(\bx)$, or a control system $\dot{x} = f(\bx,\bu)$, how do we go about finding periodic solutions which may be passively stable, open-loop stable, or stabilized with closed-loop feedback? It turns out that the trajectory optimization tools that we developed in the last chapter are very well suited to this task.

# Trajectory optimization

I introduced the trajectory optimization tools as means for optimizing a single trajectory of the system from a particular known initial condition. But they can be used in a slightly different way, too -- in some formulations we might choose to have the initial conditions as one of the decision variables. In fact, the simplest formulation for finding a periodic solution is to leave the entire trajectory as a decision variable, but to add periodicity constraints enforcing that the initial conditions and the final conditions match. Some care is needed here; if you're not careful then these constraints will all be met if the entire trajectory remains at a fixed point in the system. In the example below, we constrain that the initial and final conditions lie on the surface of section that we used before, and additionally constrain that the velocity at the initial/final points is non-zero (to avoid the fixed point solution).

Note that although we introduced trajectory optimization as searching over control inputs $\bu$, this formulation is now rich enough that it solves an important problem even for passive systems -- with no control input at all.

# Finding the limit cycle of the Van der Pol oscillator

Recall the dynamics of the Van der Pol oscillator given by $$\ddot{q} + \mu (q^2 - 1) \dot{q} + q = 0, \quad \mu>0,$$ which exhibited a stable limit cycle.

Formulate the direct collocation optimization: \begin{align*} \find_{\bx_0,...,\bx_N,dt} \quad \\ \subjto \quad & q_0 = 0, \quad \dot{q}_0 > 0, \\ & \bx_N = \bx_0, \text{(periodicity constraint)}\\ & \text{collocation dynamic constraints} \end{align*}

Try it yourself:

van_der_pol/dircol.py

As always, make sure that you take a look at the code. Poke around. Try changing some things.

One of the things that you should notice in the code is that I provide an initial guess for the solver. In most of the examples so far I've been able to avoid doing that--the solver takes small random numbers as a default initial guess and solves from there. But for this problem, I found that it was getting stuck in a local minima. Adding the initial guess that the solution moves around a circle in state space was enough.

example: simplest flapping robot

# Lyapunov analysis

Recall the important distinction between stability of a trajectory in time and stability of a limit cycle was that the limit cycle does not converge in phase -- trajectories near the cycle converge to the cycle, but trajectories on the cycle may not converge with each other. This is type of stability, also known as orbital stability can be written as stability to the manifold described by cycle $\bx^*(t)$, $\min_\tau || x(t) - x^*(\tau) || \rightarrow 0.$ In the case of limit cycles, this manifold is a periodic solution with $\bx^*(t+t_{period}) = \bx^*(t)$. Depending on exactly how that convergence happens, we can define orbital stability in the sense of Lyapunov, asymptotic orbital stability, exponential orbital stability, or even finite-time orbital stability.

In order to prove that a system is orbitally stable (locally, over a region, or globally), or to analyze the region of attraction of a limit cycle, we can use a Lyapunov function. In particular, we would like to consider Lyapunov functions which have the form cartooned below; they vanish to zero everywhere along the cycle, and are strictly positive everywhere away from the cycle.

# Transverse coordinates

How can we parameterize this class of functions? For arbitrary cycles this could be very difficult in the original coordinates. For simple cycles like in the cartoon, one could imagine using polar coordinates. More generally, we will define a new coordinate system relative to the orbit, with coordinates

• $\tau$ - the phase along the orbit
• $\bx_\perp(\tau)$ - the remaining coordinates, linearly independent from $\tau$.

Given a state $\bx$ in the original coordinates, we must define a smooth mapping $\bx \rightarrow (\tau, \bx_\perp)$ to this new coordinate system. For example, for a simple ring oscillator we might have:

In general, for an $n$-dimensional state space, $\tau$ will always be a scalar, and $\bx_\perp$ will be an $(n-1)$-dimensional vector defining the remaining coordinates relative to $\bx^*(\tau)$. In fact, although we use the notation $\bx_\perp$ the coordinate system need not be strictly orthogonal to the orbit, but must simply be transversal (not parallel). Having defined the smooth mapping $\bx \rightarrow (\tau,\bx_\perp)$, we can always rewrite the dynamics in this new coordinate system: \begin{gather*} \dot{\tau} = f_1(\bx_\perp,\tau) \\ \dot\bx_\perp = f_2(\bx_\perp,\tau). \end{gather*}

The value of this construction for Lyapunov analysis was proposed in Hauser94a and has been extended nicely to control design in Shiriaev08 and for region of attraction estimation in Manchester10b. A particular numerical strategy for defining the transversal coordinates is given in Manchester10a.

# A Lyapunov theorem for orbital stability

For a dynamical system $\dot\bx = f(\bx)$ with $\bx \in \Re^n$, $f$ continuous, a continuous periodic solution $\bx^*(\tau)$, and a smooth mapping $\bx \rightarrow (\tau,\bx_\perp)$ where $\bx_\perp$ vanishes on $\bx^*$, then for some $n-1$ dimensional ball ${\cal B}$ around the origin, if I can produce a $V(\bx_\perp,\tau)$ such that \begin{gather*} \forall \tau, V(0,\tau) = 0, \\ \forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, V(\bx_\perp,\tau) > 0, \end{gather*} with \begin{gather*} \forall \tau, \dot{V}(0,\tau) = 0, \\ \forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, \dot{V}(\bx_\perp,\tau) < 0, \end{gather*} then the solution $\bx^*(t)$ is locally orbitally asymptotically stable.

# Simple ring oscillator

Perhaps the simplest oscillator is the first-order system which converges to the unit circle. In cartesian coordinates, the dynamics are \begin{align*} \dot{x}_1 =& x_2 -\alpha x_1 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right) \\ \dot{x}_2 =& -x_1 -\alpha x_2 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right), \end{align*} where $\alpha$ is a positive scalar gain.

If we take the transverse coordinates to be the polar coordinates, shifted so that $x_\perp$ is zero on the unit circle, \begin{align*}\tau =& \text{atan2}(-x_2,x_1) \\ x_\perp =& \sqrt{x_1^2+x_2^2}-1, \end{align*} which is valid when $x_\perp>-1$, then the simple transverse dynamics are revealed: \begin{align*} \dot\tau =& 1 \\ \dot{x}_\perp =& -\alpha x_\perp. \end{align*} Taking a Lyapunov candidate $V(x_\perp,\tau) = x_\perp^2$, we can verify that $$\dot{V} = -2 \alpha x_\perp^2 \prec 0, \quad \forall x_\perp > -1.$$ This demonstrates that the limit cycle is locally asymptotically stable, and furthermore that the invariant open-set $V < 1$ is inside the region of attraction of that limit cycle. In fact, we know that all $x_\perp > -1$ are in the region of attraction that limit cycle, but this is not proven by the Lyapunov argument above.

Let's compare this approach with the approach that we used in the chapter on walking robots, where we used a Poincaré map analysis to investigate limit cycle stability. In transverse coordinates approach, there is an additional burden to construct the coordinate system along the entire trajectory, instead of only at a single surface of section. In fact, the transverse coordinates approach is sometimes referred to as a "moving Poincaré section". But the reward for this extra work is that we can check a condition that only involves the instantaneous dynamics, $f(\bx)$, as opposed to having to integrate the dynamics over an entire cycle to generate the discrete Poincaré map, $\bx_p[n+1] = P(\bx_p[n])$. While computing $P$ in closed-form happened to be possible for the simple rimless wheel model, it is rarely possible for more complex models. As we will see below, this approach will also be more compatible with designing continuous feedback controller that stabilize the limit cycle.

# Transverse linearization

In the case of Lyapunov analysis around a fixed point, there was an important special case: for stable linear systems, we actually have a recipe for constructing a Lyapunov function. As a result, for nonlinear systems, we often found it convenient to begin our search by linearizing around the fixed point and using the Lyapunov candidate for the linear system as an initial guess. That same approach can be extended to limit cycle analysis.

# Region of attraction estimation using sums-of-squares

Coming soon. If you are interested, see Manchester10a+Manchester10b.

# The hybrid system's approach

Transverse stabilization is more natural than full LQR

# Robust and Stochastic Control

Stochastic optimal control. Stochastic HJB. Graph search. LQR + Gaussian noise. Whitening filters. Stochastic trajectory optimization. Stochastic verification. (Having bounds on expected value does yield bounds on system state). Robust control. Common Lyapunov functions. Dissipation inequalities. Guaranteed-cost control for linear systems. H-infinity.

# Planning Under Uncertainty

POMDP, SLAG, Belief-space sample-based planning, ...

# Verification and Validation

Lyap w/ richer specs. Monte carlo... .

# System Identification

equation error vs simulation error. model-reduction for linear systems. optimal experiment design

# Drake

The website is the main source for information and documentation. This goal of this chapter provide a few specific pointers and recommended workflows to get you up and running quickly with the examples provided in these notes.

NOTE: If you are taking the MIT class, we have provided a Docker workflow for you to make sure you are kept up to date week by week. Do NOT use the instructions below. Follow these instructions instead.

Ubuntu Linux | Mac Homebrew Ubuntu Linux | Mac Homebrew

git clone https://github.com/RussTedrake/underactuated.git

and install the prerequisites using the platform-specific installation script provided:

sudo underactuated/scripts/setup/ubuntu/16.04/install_prereqs
export PYTHONPATH=pwd/underactuated/src:${PYTHONPATH} underactuated/scripts/setup/mac/install_prereqs export PYTHONPATH=pwd/underactuated/src:${PYTHONPATH}

# Install Drake

Although I hope that power users will build from source (and contribute back), I recommend using the precompiled binaries to get started. The links below indicate the specific distributions that the examples in this text have been tested against.

curl -o drake.tar.gz https://drake-packages.csail.mit.edu/drake/nightly/drake-latest-xenial.tar.gz
curl -o drake.tar.gz https://drake-packages.csail.mit.edu/drake/nightly/drake-latest-mac.tar.gz
sudo tar -xvzf drake.tar.gz -C /opt
export PYTHONPATH=/opt/drake/lib/python2.7/site-packages:${PYTHONPATH} sudo tar -xvzf drake.tar.gz -C /opt /opt/drake/share/drake/setup/install_prereqs export PYTHONPATH=/opt/drake/lib/python2.7/site-packages:${PYTHONPATH}