Algorithms for Walking, Running, Swimming, Flying, and Manipulation
© Russ Tedrake, 2020
Last modified .
How to cite these notes |
Send me your feedback
Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2020 semester. Lecture videos are available on YouTube.
Previous Chapter | Table of contents | Next Chapter |
I've argued that optimal control is a powerful framework for specifying complex behaviors with simple objective functions, letting the dynamics and constraints on the system shape the resulting feedback controller (and vice versa!). But the computational tools that we've provided so far have been limited in some important ways. The numerical approaches to dynamic programming which involve putting a mesh over the state space do not scale well to systems with state dimension more than four or five. Linearization around a nominal operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g. using LQR) for even very high-dimensional systems, but the effectiveness of the resulting controllers is limited to the region of state space where the linearization is a good approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis from the last chapter can provide, among other things, an effective way to compute estimates of those regions. But we have not yet provided any real computational tools for approximate optimal control that work for high-dimensional systems beyond the linearization around a goal. That is precisely the goal for this chapter.
The big change that is going to allow us to scale to high-dimensional systems is that we are going to give up the goal of solving for the optimal feedback controller for the entire state space, and instead attempt to find an optimal control solution that is valid from only a single initial condition. Instead of representing this as a feedback control function, we can represent this solution as a trajectory, $\bx(t), \bu(t)$, typically defined over a finite interval.
Given an initial condition, $\bx_0$, and an input trajectory $\bu(t)$ defined over a finite interval, $t\in[t_0,t_f]$, we can compute the long-term (finite-horizon) cost of executing that trajectory using the standard additive-cost optimal control objective, \[ J_{\bu(\cdot)}(\bx_0) = \ell_f (\bx(t_f)) + \int_{t_0}^{t_f} \ell(\bx(t),\bu(t)) dt. \] We will write the trajectory optimization problem as \begin{align*} \min_{\bu(\cdot)} \quad & \ell_f (\bx(t_f)) + \int_{t_0}^{t_f} \ell(\bx(t),\bu(t)) dt \\ \subjto \quad & \dot{\bx}(t) = f(\bx(t),\bu(t)), \quad \forall t\in[t_0, t_f] \\ & \bx(t_0) = \bx_0. \\ \end{align*} Some trajectory optimization problems may also include additional constraints, such as collision avoidance (e.g., where the constraint is that the signed distance between the robot's geometry and the obstacles stays positive) or input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ), which can be defined for all time or some subset of the trajectory.
As written, the optimization above is an optimization over continuous
trajectories. In order to formulate this as a numerical optimization, we
must parameterize it with a finite set of numbers. Perhaps not
surprisingly, there are many different ways to write down this
parameterization, with a variety of different properties in terms of speed,
robustness, and accuracy of the results. We will outline just a few of the
most popular below; I would recommend
Let us first consider the case of linear systems. In fact, if we start in discrete time, we can even defer the question of how to best discretize the continuous-time problem. There are a few different ways that we might "transcribe" this optimization problem into a concrete mathematical program.
For instance, let us start by writing both $\bu[\cdot]$ and $\bx[\cdot]$ as decision variables. Then we can write: \begin{align*} \min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n_0}^{N-1} \ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf A}\bx[n] + {\bf B}\bu[n], \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & + \text{additional constraints}. \end{align*} We call this modeling choice -- adding $\bx[\cdot]$ as decision variables and modeling the discrete dynamics as explicit constraints -- the "direct transcription". Importantly, for linear systems, the dynamics constraints are linear constraints in these decision variables. As a result, if we can restrict our additional constraints to linear inequality constraints and our objective function to being linear/quadratic in $\bx$ and $\bu$, then the resulting trajectory optimization is a convex optimization (specifically a linear program or quadratic program depending on the objective). As a result, we can reliably solve these problems to global optimality at quite large scale, potentially even online -- consider the case where the constraints, like staying in a lane, are not known until runtime.
We've looked at a few optimal control problems for the double integrator using value iteration. For one of them -- the quadratic objective with no constraints on $\bu$ -- we know now that we could have solved the problem "exactly" using LQR. But the minimum-time problem and even the LQR problem with constraints do not yet have more satisyfing solutions than discretizing state space, which causes potentially large numerical artifacts.
In the trajectory formulation, we can solve these problems exactly for the discrete-time double integrator, and with better accuracy for the continuous-time double integrator. Take a moment to appreciate that! The bang-bang policy and cost-to-go functions are fairly nontrivial functions of state; it's quite satisfying that we can evaluate them using convex optimization! The limitation, of course, is that we are only solving them for one initial condition at a time.
If you have not studied linear programming before, you might be surprised by the modeling power of even linear objectives within this framework. Consider, for instance, that we want to drive a system to the origin. It turns out that taking $$\ell(\bx,\bu) = |\bx| + |\bu|,$$ the direct transcription can still be formulated as a linear program. To do it, add additional decision variables ${\bf s}_x[\cdot]$ and ${\bf s}_u[\cdot]$ -- these are commonly referred to as slack variables -- and write $$\min_{\bx,\bu,{\bf s}_x,{\bf s}_u} \sum_n^{N-1} {\bf s}_x[n] + {\bf s}_u[n], \quad \text{s.t.} \quad {\bf s}_x[n] \ge x[n], \quad {\bf s}_x[n] \ge -x[n], \quad ...$$ The field of convex optimization is replete with tricks like this. Knowing and recognizing them are skills of the (optimization) trade. But there are many relevant constraints which cannot be recast into convex constraints with any amount of skill -- going either left or right around an obstacle is one of the most important examples. These represent a fundamentally non-convex constraints in $\bx$; we'll discuss the implications of using non-convex optimization for trajectory optimization below.
The savvy reader might have noticed that adding $\bx[\cdot]$ as decision variables was not strictly necessary. If we know $\bx[0]$ and we know $\bu[\cdot]$, then we should be able to solve for $\bx[n]$ -- that's just forward simulation. Indeed, by satisfying the constraints in direct transcription, the solver is effectively solving the difference equation (acausally). For our discrete-time linear systems, this is particularly easy and nice: \begin{align*}\bx[1] =& {\bf A}\bx[0] + {\bf B}\bu[0] \\ \bx[2] =& {\bf A}({\bf A}\bx[0] + {\bf B}\bu[0]) + {\bf B}\bu[1] \\ \bx[n] =& {\bf A}^n\bx[0] + \sum_{k=0}^{n-1} {\bf A}^{n-1-k}{\bf B}u[k].\end{align*} What's more, the solution is still linear in $\bu[\cdot]$. This is amazing... we can get rid of a bunch of decision variables, and turn a constrained optimization problem into an unconstrained optimization problem (assuming we don't have any other constraints). This approach -- using $\bu[\cdot]$ but not $\bx[\cdot]$ as decision variables and evaluating $\ell(\bx[n],\bu[n])$ by effectively forward simulating to obtain $\bx[n]$ -- is called the direct shooting transcription. For linear systems with linear/quadratic objectives in $\bx$, and $\bu$, it is still a convex optimization, and has less decision variables and constraints than the direct transcription.
So is direct shooting uniformly better than the direct transcription approach? I think it is not. There are a few potential reason that one might prefer the direct transcription:
For linear convex problems, the solvers are mature enough that these differences often don't amount to much. For nonlinear optimization problems, the differences can be substantial. And both approaches have their loyal camps.
It is also worth noting that the problems generated by the direct
transcription have an important and exploitable "banded" sparsity pattern
-- most of the constraints touch only a small number of variables -- it's
actually the same pattern that we exploit in the Riccati equations.
Thanks to the importance of these methods in real applications, numerous
specialized solvers have been written to explicitly exploit this sparsity
(e.g.
If we wish to solve the continuous-time version of the problem, then we can discretize time and use the formulations above. The most important decision is the discretization / numerical integration scheme. For linear systems, if we assume that the control inputs are held constant for each time step (aka zero-order hold), then we can integrate the dynamics perfectly: $$\bx[n+1] = \bx[n] + \int_{t_n}^{t_n + h} \left[ {\bf A} \bx(t) + {\bf B}\bu \right]dt = e^{{\bf A}h}\bx[n] + {\bf A}^{-1}(e^{{\bf A}h} - {\bf I}){\bf B}\bu[n],$$ is the simple case (when ${\bf A}$ is invertible). But in general, we can use any finitely-parameterized representation of $\bu(t)$ and any numerical integration scheme to obtain $\bx[n+1]={\bf f}(\bx[n], \bu[n])$.
I humbly recommend that you study the convex trajectory optimization case for clarity and sense of purpose, in practice trajectory optimization is often used to solve nonconvex problems. Our formulation can become non-convex for a number of reasons -- if the dynamics are nonlinear then the dynamic constraints are non-convex. You may also wish to have a nonconvex objective or nonconvex additional constraint (e.g. non-collision), even if your dynamics are linear. Typically we formulate these problems using tools from nonlinear programming.
The formulations that we wrote for direct transcription and direct shooting above are still valid when the dynamics are nonlinear, it's just that the resulting problem is nonconvex. For instance, the direct transcription for discrete-time systems becomes the more general: \begin{align*} \min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n_0}^{N-1} \ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf f}(\bx[n], \bu[n]), \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & + \text{additional constraints}. \end{align*} Direct shooting still works, too, since on each iteration of the algorithm we can compute $\bx[n]$ given $\bx[0]$ and $\bu[\cdot]$ by forward simulation. But things get a bit more interesting when we consider continuous-time systems.
For nonlinear dynamics, there is no one obvious choice for how realize
the discrete dynamics $$\bx[n+1] = \bx[n] + \int_{t[n]}^{t[n+1]} f(\bx(t),
\bu(t)) dt, \quad \bx(t[n]) = \bx[n].$$ For instance, in
But once we have opened the door to nonlinear optimization approaches, then a new trick becomes available. In the convex formulation, we chose the time discretization apriori, and left it fixed for the duration of the optimization. That is because time enters in a nonlinear way into the constraints. But, as I've hinted with my notation above, we can now imagine adding $t[\cdot]$ to the problem as decision variables, allowing the optimizer to stretch or shrink the time intervals in order to solve the problem. Adding some constraints to these time variables is essential in order to avoid trivial solutions (like collapsing to a trajectory of zero duration); one could potentially even add more advanced constraints to control the integration error.
It is very satisfying to have a suite of numerical integration routines available for our direct transcription. But numerical integrators are designed to solve forward in time, and this represents a design constraint that we don't actually have in our direct transcription formulation. If our goal is to obtain an accurate solution to the differential equation with a small number of function evaluations / decision variables / constraints, then some new formulations are possible that take advantage of the constrained optimization formulation. These include the the so-called collocation methods.
In direct collocation (c.f.,
It turns out that in this sweet spot, the only decision variables we need in our optimization are the values $\bu(t)$ and $\bx(t)$ at the break points of the spline. You might think that you would need the coefficients of the cubic spline parameters, but you do not. For the first order interpolation on $\bu(t)$, given $\bu(t_k)$ and $\bu(t_{k+1})$, we can solve for every value $\bu(t)$ over the interval $t \in [k, k+1]$. But we also have everything that we need for the cubic spline: given $\bx(t_k)$ and $\bu(t_k)$, we can compute $\dot\bx(t_k) = f (\bx(t_k), \bu(t_k))$; and the four values $\bx(t_k), \bx(t_{k+1}), \dot\bx (t_k), \dot\bx(t_{k+1})$ completely define all of the parameters of the cubic spline over the interval $t\in[t_k, t_{k+1}]$. This is very convenient, because it is easy for us to add additional constraints to $\bu$ and $\bx$ at the knot points (and would have been relatively harder to have to convert every constraint into constraints on the spline coefficients).
So far, given $\dot{\bx} = f(\bx,\bu)$, a list of break points $t_0, . t_N$, and arbitrary values $\bx(t_k), \bu(t_k)$, we can extract the spline representation of $\bu(t)$ and $\bx(t)$. So far we haven't added any actual constraints to our mathematical program. Now, just like in direct transcription, we need a constraint that will tell the optimizer how $\bx(t_{k+1})$ is related to $\bx(t_k)$, etc. In direct collocation, the constraint that we add is that the derivative of the spline at the collocation points matches the dynamics. In particular, for our sweet spot, if we choose the collocation points to be the midpoints of the spline, then we have that \begin{gather*} t_{c,k} = \frac{1}{2}\left(t_k + t_{k+1}\right), \qquad h_k = t_{k+1} - t_k, \\ \bu(t_{c,k}) = \frac{1}{2}\left(\bu(t_k) + \bu(t_{k+1})\right), \\ \bx(t_{c,k}) = \frac{1}{2}\left(\bx(t_k) + \bx(t_{k+1})\right) + \frac{h}{8} \left(\dot\bx(t_k) - \dot\bx(t_{k+1})\right), \\ \dot\bx(t_{c,k}) = -\frac{3}{2h}\left(\bx(t_k) - \bx(t_{k+1})\right) - \frac{1}{4} \left(\dot\bx(t_k) + \dot\bx(t_{k+1})\right). \end{gather*} These equations come directly from the equations that fit the cubic spline to the end points/derivatives then interpolate them at the midpoint. They give us precisely what we need to add the dynamics constraint to our optimization at the collocation points:\begin{align*} \min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n_0}^{N-1} h_n \ell(\bx[n],\bu[n]) \\ \subjto \quad & \dot\bx(t_{c,n}) = f(\bx(t_{c,n}), \bu(t_{c,n})), & \forall n \in [0,N-1] \\ & \bx[0] = \bx_0 \\ & + \text{additional constraints}. \end{align*} I hope this notation is clear -- I'm using $\bx[k] = \bx(t_k)$ as the decision variables, and the collocation constraint at $t_{c,k}$ depends on the decision variables: $\bx[k], \bx[k+1], \bu[k], \bu[k+1]$. The actual equations of motion get evaluated at both the break points, $t_k$, and the collocation points, $t_{c,k}$.
Once again, direct collocation effectively integrates the equations of
motion by satisfying the constraints of the optimization -- this time
producing an integration of the dynamics that is accurate to third-order
with effectively two evaluations of the plant dynamics per segment (since
we use $\dot\bx(t_k)$ for two intervals).
Direct collocation also easily solves the minimum-time problem for the double integrator. The performance is similar to direct transcription, but the convergence is visibly different. Try it for yourself:
The direct collocation method of
The pseudo-spectral methods are also sometimes knowns as "orthogonal collocation" because the $N$ basis polynomials, $\phi_j(t)$, are chosen so that at the $N$th collocation point $t_j$, we have $$\phi_i(t_j) = \begin{cases} 1 & i=j, \\ 0 & \text{otherwise.}\end{cases}$$ This can be accomplished by choosing $$\phi_j(t) = \prod_{i=0, i\ne j}^{N} \frac{t-t_i}{t_j - t_i}.$$ Note that for both numerical reasons and for analysis, time is traditionally rescaled from the interval $[t_0, t_f]$ to $[-1, 1]$. Collocation points are chosen based on small variations of Gaussian quadrature, known as the "Gauss-Lobatto" which includes collocation points at $t=-1$ and $t=1$.
Interestingly, a number of papers have also infinite-horizon
psuedo-spectral optimization by the nonlinear rescaling of the time
interval $t\in[0, \infty)$ to the half-open interval $\tau\in[-1, 1)$ via
$\tau = \frac{t-1}{t+1}$
There is another, seemingly subtle but potentially important, opportunity that can be exploited in a few of these transcriptions, if our main interest is in optimizing systems with significant multibody dynamics. In some cases, we can actually write the dynamics constraints directly in their implicit form. We've introduced this idea already in the context of Lyapunov analysis. In many cases, it is nicer or more efficient to obtain the equations of motion in an implicit form, ${\bf g}(\bx,\bu,\dot\bx) = 0$, and to avoid ever having to solve for the explicit form $\dot\bx = {\bf f}(\bx,\bu).$ This can become even more important when we consider systems for which the explicit form doesn't have a unique solution -- we will see examples of this when we study trajectory optimization through contact because the Coulomb model for friction actually results in a differential inclusion instead of a differential equation.
The collocation methods, which operate on the dynamic constraints at collocation points directly in their continuous form, can use the implicit form directly. It is possible to write a time-stepping (discrete-time approximation) for direct transcription using implicit integrators -- again providing constraints in implicit form. The implicit form is harder to exploit in the shooting methods.
The different transcriptions presented above represent different ways to map the (potentially continuous-time) optimal control problem into a finite set of decision variables, objectives, and constraints. But even once that choice is made, there are numerous approaches to solving this optimization problem. Any general approach to nonlinear programming can be applied here; in the python examples we've included so far, the problems are handed directly to the sequential-quadratic programming (SQP) solver SNOPT, or to the interior-point solver IPOPT.
There is also quite a bit of exploitable problem-specific stucture in these trajectory optimization problems due to the sequential nature of the problem. As a result, there are some ideas that are fairly specific to the trajectory optimization formulation of optimal control, and customized solvers can often (and sometimes dramatically) outperform general purpose solvers.
This trajectory-optimization structure is easiest to discuss, and
implement, in unconstrained formulations, so we will start there. In fact,
in recent years we have seen a surge in popularity in robotics for doing
trajectory optimization using (often special-purpose) solvers for
unconstrained trajectory optimization, where the constrained problems are
transformed into unconstrained problem via penalty methods. I would say penalty
methods based on the augmented Lagrangian are particularly popular for
trajectory optimization these days
Providing gradients of the objectives and constraints to the solver is not strictly required -- most solvers will obtain them from finite differences if they are not provided -- but I feel strongly that the solvers are faster and more robust when exact gradients are provided. Providing the gradients for the direct transcription methods is very straight-forward -- we simply provide the gradients for each constraint individually. But in the direct shooting approach, where we have removed the $\bx$ decision variables from the program but still write objectives and constraints in terms of $\bx$, it would become very inefficient to compute the gradients of each objective/constraint independently. We need to leverage the chain rule.
To be concise (and slightly more general), let us define $\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the continuous dynamics; for example, the forward Euler integration scheme used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$ Then we have \[\pd{J}{\bu_k} = \pd{\ell_f(\bx[N])}{\bu_k} + \sum_{n=0}^{N-1} \left(\pd{\ell(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{\ell(\bx[n],\bu[n])}{\bu_k} \right), \] where the gradient of the state with respect to the inputs can be computed during the "forward simulation", \[ \pd{\bx[n+1]}{\bu_k} = \pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{f_d(\bx[n],\bu[n])}{\bu_k}. \] These simulation gradients can also be used in the chain rule to provide the gradients of any constraints. Note that there are a lot of terms to keep around here, on the order of (state dim) $\times$ (control dim) $\times$ (number of timesteps). Ouch. Note also that many of these terms are zero; for instance with the Euler integration scheme above $\pd{\bu[n]}{\bu_k} = 0$ if $k\ne n$. (If this looks like I'm mixing two notations here, recall that I'm using $\bu_k$ to represent the decision variable and $\bu[n]$ to represent the input used in the $n$th step of the simulation.)
By solving for $\bx(\cdot)$ ourselves, we've removed a large number of constraints from the optimization. If no additional state constraints are present, and the only gradients we need to compute are the gradients of the objective, then a surprisingly efficient algorithm emerges. I'll give the steps here without derivation, but will derive it in the Pontryagin section below:
Here $\lambda[n]$ is a vector the same size as $\bx[n]$ which has an interpretation as $\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is known as the adjoint equation, and it represents a dramatic efficiency improvement over calculating the huge number of simulation gradients described above. In case you are interested, yes the adjoint equation is exactly the backpropagation algorithm that is famous in the neural networks literature, or more generally a bespoke version of reverse-mode automatic differentiation.
As you begin to play with these algorithms on your own problems, you might feel like you're on an emotional roller-coaster. You will have moments of incredible happiness -- the solver may find very impressive solutions to highly nontrivial problems. But you will also have moments of frustration, where the solver returns an awful solution, or simply refuses to return a solution (saying "infeasible"). The frustrating thing is, you cannot distinguish between a problem that is actually infeasible, vs. the case where the solver was simply stuck in a local minima.
So the next phase of your journey is to start trying to "help" the solver along. There are two common approaches.
The first is tuning your cost function -- some people spend a lot of time adding new elements to the objective or adjusting the relative weight of the different components of the objective. This is a slippery slope, and I tend to try to avoid it (possibly to a fault; other groups tend to put out more compelling videos!).
The second approach is to give a better initial guess to your solver to put it in the vicinity of the "right" local minimal. I find this approach more satisfying, because for most problems I think there really is a "correct" formulation for the objective and constraints, and we should just aim to find the optimal solution. Once again, we do see a difference here between the direct shooting algorithms and the direct transcription / collocation algorithms. For shooting, we can only provide the solver with an initial guess for $\bu(\cdot)$, whereas the other methods allow us to also specify an initial guess for $\bx(\cdot)$ directly. I find that this can help substantially, even with very simple initializations. In the direct collocation examples for the swing-up problem of the acrobot and cart-pole, simply providing the initial guess for $\bx(\cdot)$ as a straight line trajectory between the start and the goal was enough to help the solver find a good solution; in fact it was necessary.
Once we have obtained a locally optimal trajectory from trajectory optimization, there is still work to do...
Take $\bar\bx(t) = \bx(t) - \bx_0(t)$, and $\bar\bu(t) = \bu(t)-\bu_0(t)$, then apply finite-horizon LQR (see the LQR chapter).
The maturity, robustness, and speed of solving trajectory optimization using convex optimization leads to a beautiful idea: if we can optimize trajectories quickly enough, then we can use our trajectory optimization as a feedback policy. The recipe is simple: (1) measure the current state, (2) optimize a trajectory from the current state, (3) execute the first action from the optimized trajectory, (4) let the dynamics evolve for one step and repeat. This recipe is known as model-predictive control (MPC).
Despite the very computational nature of this controller (there is no
closed-form representation of this policy; it is represented only
implicitly as the solution of the optimization), there is a bounty of
theoretical and algorithmic results on MPC
One core idea is the concept of receding-horizon MPC. Since our trajectory optimization problems are formulated over a finite-horizon, we can think each optimization as reasoning about the next $N$ timesteps. If our true objective is to optimize the performance over a horizon longer than $N$ (e.g. over the infinite horizon), then it is standard to continue solving for an $N$ step horizon on each evaluation of the controller. In this sense, the total horizon under consideration continues to move forward in time (e.g. to recede).
Some care must be taken in receding-horizon formulations because on each new step we are introducing new costs and constraints into the problem (the ones that would have been associated with time $N+1$ on the previous solve) -- it would be very bad to march forward in time solving convex optimization problems only to suddenly encounter a situation where the solver returns "infeasible!". One can design MPC formulations that guarantee recursive feasibility -- e.g. guarantee that if a feasible solution is found at time $n$, then the solver will also find a feasible solution at time $n+1$.
Perhaps the simplest mechanism for guaranteeing recursive feasibility
in an optimization for stabilizing a fixed point, $(\bx^*, \bu^*)$, is to
add a final-value constraint to the receding horizon, $\bx[N] = \bx^*$.
This idea is simple but important. Considering the
trajectories/constraints in absolute time, then on step $k$ of the
algorithm, we are optimizing for $\bx[k], ... , \bx[k+N],$ and $\bu[k],
..., \bu[k+N-1]$; let us say that we have found a feasible solution for
this problem. The danger in receding-horizon control is that when we shift
to the next step ($k+1$) we introduce constraints on the system at
$\bx[k+N+1]$ for the first time. But if our feasible solution in step $k$
had $\bx[k+N] = \bx^*$, then we know that setting $\bx[k+N+1] = \bx^*,
\bu[k+N] = \bu^*$ is guaranteed to provide a feasible solution to the new
optimization problem in step $k+1$. With feasibility guaranteed, the
solver is free to search for a lower-cost solution (which may be available
now because we've shifted the final-value constraint further into the
future). It is also possible to formulate MPC problems that guarantee
recursive feasibility even in the presence of modeling errors and
disturbances (c.f.
The theoretical and practical aspects of Linear MPC are so well understood today that it is considered the de-facto generalization of LQR for controlling a linear system subject to (linear) constraints.
From 2008 til 2014, my group conducted a series of increasingly
sophisticated investigations
At the outset, this was a daunting task. When birds land on a perch, they pitch up and expose their wings to an "angle-of-attack" that far exceeds the typical flight envelope. Airplanes traditionally work hard to avoid this regime because it leads to aerodynamic "stall" -- a sudden loss of lift after the airflow separates from the wing. But this loss of lift is also accompanied by a signficant increase in drag, and birds exploit this when they rapidly decelerate to land on a perch. Post-stall aerodynamics are a challenge for control because (1) the aerodynamics are time-varying (characterized by periodic vortex shedding) and nonlinear, (2) it is much harder to build accurate models of this flight regime, at least in a wind tunnel, and (3) stall implies a loss of attached flow on the wing and therefore on the control surfaces, so a potentially large reduction in control authority.
We picked the project initially thinking that it would be a nice example
for model-free control (like reinforcement learning -- since the models were
unknown). In the end, however, it turned out to be the project that really
taught me about the power of model-based trajectory optimization and linear
optimal control. By conducting dynamic system identification experimetns in
motion captured, we were able to fit both surprisingly simple models (based
on flat-plate theory) to the dynamics
I was wrong. Over and over again, I watched time-varying linear quadratic regulators take highly nontrivial corrective actions -- for instance, dipping down early in the trajectory to gain kinetic energy or tipping up to dump energy out of the system -- in order to land on the perch at the final time. Although the quality of the linear approximation of the dynamics did degrade the farther that we got from the nominal trajectory, the validity of the controller dropped off much less quickly (even as the vector field changed, the direction that the controller needed to push did not). This was also the thinking that got me initially so interested in understanding the regions of attraction of linear control on nonlinear systems.
In the end, the experiments were very successful. We started searching for the "simplest" aircraft that we could build that would capture the essential control dynamics, reduce complexity, and still accomplish the task. We ended up building a series of flat-plate foam gliders (no propellor) with only a single actuator to control the elevator. We added dihedral to the wings to help the aircraft stay in the longitudinal plane. The simplicity of these aircraft, plus the fact that they could be understood through the lens of quite simple models makes them one of my favorite canonical underactuated systems.
In our experiments, we found the the dynamics of our aircraft were
capture very well by the so-called "flat plate model"
The linear controller around a nominal trajectory was surprisingly effective, but it's not enough. We'll return to this example again when we talk about "feedback motion planning", in order to discuss how to find a controller that can work for many more initial conditions -- ideally all of the initial conditions of interest for which the aircraft is capable of getting to the goal.
The tools that we've been developing for numerical trajectory optimization are closely tied to theorems from (analytical) optimal control. Let us take one section to appreciate those connections.
What precisely does it mean for a trajectory, $\bx(\cdot),\bu(\cdot)$, to be locally optimal? It means that if I were to perturb that trajectory in any way (e.g. change $\bu_3$ by $\epsilon$), then I would either incur higher cost in my objective function or violate a constraint. For an unconstrained optimization, a necessary condition for local optimality is that the gradient of the objective at the solution be exactly zero. Of course the gradient can also vanish at local maxima or saddle points, but it certainly must vanish at local minima. We can generalize this argument to constrained optimization using Lagrange multipliers.
Let us use Lagrange multipliers to derive the necessary conditions for our constrained trajectory optimization problem in discrete time \begin{align*} \min_{\bx[\cdot],\bu[\cdot]} & \ell_f(\bx[N]) + \sum_{n=0}^{N-1} \ell(\bx[n],\bu[n]),\\ \subjto \quad & \bx[n+1] = f_d(\bx[n],\bu[n]). \end{align*} Formulate the Lagrangian, \[L(\bx[\cdot],\bu[\cdot],\lambda[\cdot]) = \ell_f(\bx[N]) + \sum_{n=0}^{N-1} \ell(\bx[n],\bu[n]) + \sum_{n=0}^{N-1} \lambda^T[n] \left(f_d(\bx[n],\bu[n]) - \bx[n+1]\right), \] and set the derivatives to zero to obtain the adjoint equation method described for the shooting algorithm above: \begin{gather*} \forall n\in[0,N-1], \pd{L}{\lambda[n]} = f_d(\bx[n],\bu[n]) - \bx[n+1] = 0 \Rightarrow \bx[n+1] = f(\bx[n],\bu[n]) \\ \forall n\in[0,N-1], \pd{L}{\bx[n]} = \pd{\ell(\bx[n],\bu[n])}{\bx} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bx} - \lambda^T[n-1] = 0 \\ \quad \Rightarrow \lambda[n-1] = \pd{\ell(\bx[n],\bu[n])}{\bx}^T + \pd{f_d(\bx[n],\bu[n])}{\bx}^T \lambda[n]. \\ \pd{L}{\bx[N]} = \pd{\ell_f}{\bx}^T - \lambda^T[N-1] = 0 \Rightarrow \lambda[N-1] = \pd{\ell_f}{\bx} \\ \forall n\in[0,N-1], \pd{L}{\bu[n]} = \pd{\ell(\bx[n],\bu[n])}{\bu} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bu} = 0. \end{gather*} Therefore, if we are given an initial condition $\bx_0$ and an input trajectory $\bu[\cdot]$, we can verify that it satisfies the necessary conditions for optimality by simulating the system forward in time to solve for $\bx[\cdot]$, solving the adjoint equation backwards in time to solve for $\lambda[\cdot]$, and verifying that $\pd{L}{\bu[n]} = 0$ for all $n$. The fact that $\pd{J}{\bu} = \pd{L}{\bu}$ when $\pd{L}{\bx} = 0$ and $\pd{L}{\lambda} = 0$ follows from some basic results in the calculus of variations.
You won't be surprised to hear that these necessary conditions have an analogue in continuous time. I'll state it here without derivation. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $\ell(\bx,\bu)$, for a trajectory $\bx(\cdot),\bu(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal it must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx} = f(\bx,\bu), \quad &\bx(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{\ell}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = \pd{\ell_f}{\bx}^T \\ \forall t\in[t_0,t_f],\quad & \pd{\ell}{\bu} + \lambda^T\pd{f}{\bu} = 0.& \end{align*}
In fact the statement can be generalized even beyond this to the case where $\bu$ has constraints. The result is known as Pontryagin's minimum principle -- giving necessary conditions for a trajectory to be optimal.
Adapted from
Note that the terms which are minimized in the final line of the theorem are commonly referred to as the Hamiltonian of the optimal control problem, $$H(\bx,\bu,\lambda,t) = \ell(\bx,\bu) + \lambda^T f(\bx,\bu).$$ It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that $\lambda$ has an interpretation as $\pd{J}{\bx}^T$, you should also recognize it from the HJB.
There is another approach to trajectory optimization (at least for
initial-value problems) that has gotten quite popular lately. Iterative
LQR (iLQR)
The motivation behind iterative LQR is quite appealing -- it
leverages the surprising structure of the Riccati equations to come up
with a second-order update to the trajectory after a single backward
pass of the Riccati equation followed by a forward simulation.
Anecdotally, the convergence is fast and robust. Numerous
applications...
One key limitation of iLQR is that it only supports unconstrained trajectory optimization -- at least via the Riccati solution. The natural extension for constrained optimization would be to replace the Riccati solution with an iterative linear model-predictive control (MPC) optimization; this would result in a quadratic program and is very close to what is happening in SQP. But a more common approach in the current robotics literature is to add any constraints into the objective function using penalty methods, especially using Augmented Lagrangian.
In this coding exercise we explore in detail the computational advantages of direct transcription over direct shooting methods. The exercise can be completed entirely in this python notebook. To simplify the analysis, we will apply these two methods to a finite-horizon LQR problem. You are asked to complete four pieces of code:
For this exercise you will work exclusively in this notebook. You are asked to find, via nonlinear trajectory optimization, a path that efficiently transfers a rocket from the Earth to Mars, while avoiding a cloud of asteroids. The skeleton of the optimization problem is already there, but several important pieces are missing. More details are in the notebook, but you will need to:
The exercise is self-contained in this notebook. In this exercise you will derive and implement the iterative Linear Quadratic Regulator (iLQR). You will evaluate it's functionality by planning trajectories for an autonomous vehicle. You will need to:
[1] | John T. Betts. Survey of numerical methods for trajectory optimization. Journal of Guidance, Control, and Dynamics, 21(2):193--207, 1998. |
[2] | John T. Betts. Practical Methods for Optimal Control Using Nonlinear Programming. SIAM Advances in Design and Control. Society for Industrial and Applied Mathematics, 2001. |
[3] | Yang Wang and Stephen Boyd. Fast model predictive control using online optimization. IEEE Transactions on control systems technology, 18(2):267--278, 2009. |
[4] | C. R. Hargraves and S. W. Paris. Direct trajectory optimization using nonlinear programming and collocation. J Guidance, 10(4):338--342, July-August 1987. |
[5] | Garg, Divya, Patterson, Michael, Francolin, Camila, Darby, Christopher, Huntington, Geoffrey, Hager, William, Rao, and Anil. Direct trajectory optimization and costate estimation of finite-horizon and infinite-horizon optimal control problems using a radau pseudospectral method. Computational Optimization and Applications, 49:335--358, 2011. 10.1007/s10589-009-9291-0. |
[6] | I. Michael Ross and Mark Karpenko. A review of pseudospectral optimal control: From theory to flight. Annual Reviews in Control, 36(2):182--197, dec 2012. |
[7] | TC Lin and JS Arora. Differential dynamic programming technique for constrained optimal control. Computational Mechanics, 9(1):27--40, 1991. |
[8] | Marc Toussaint. A novel augmented lagrangian approach for inequalities and convergent any-time non-central updates. arXiv:1412.4329 [math.OC], 2014. |
[9] | Carlos E Garcia, David M Prett, and Manfred Morari. Model predictive control: theory and practice—a survey. Automatica, 25(3):335--348, 1989. |
[10] | Eduardo F Camacho and Carlos Bordons Alba. Model predictive control. Springer Science & Business Media, 2013. |
[11] | Alberto Bemporad and Manfred Morari. Robust model predictive control: A survey. In A. Garulli and A. Tesi, editors, Robustness in identification and control, volume 245 of Lecture Notes in Control and Information Sciences, pages 207--226. Springer Berlin / Heidelberg, 1999. 10.1007/BFb0109870. |
[12] | Rick Cory and Russ Tedrake. Experiments in fixed-wing UAV perching. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, pages 1--12. AIAA, 2008. [ www: ] |
[13] | John W. Roberts, Rick Cory, and Russ Tedrake. On the controllability of fixed-wing perching. In Proceedings of the American Control Conference (ACC), 2009. [ www: ] |
[14] | Rick Cory. Supermaneuverable Perching. PhD thesis, Massachusetts Institute of Technology, June 2010. [ www: ] |
[15] | Joseph Moore. Powerline perching with a fixed-wing UAV. Master's thesis, Massachusetts Institute of Technology, May 2011. [ www: ] |
[16] | Joseph Moore and Russ Tedrake. Control synthesis and verification for a perching UAV using LQR-trees. In Proceedings of the IEEE Conference on Decision and Control, page 8, Maui, Hawaii, December 2012. [ www: ] |
[17] | Joseph Moore. Robust Post-Stall Perching with a Fixed-Wing UAV. PhD thesis, Massachusetts Institute of Technology, September 2014. [ www: ] |
[18] | Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 2nd edition, 2000. |
[19] | Weiwei Li and Emanuel Todorov. Iterative linear quadratic regulator design for nonlinear biological movement systems. In International Conference on Informatics in Control, Automation and Robotics, pages 222--229, 2004. |
[20] | Athanasios Sideris and James E. Bobrow. A fast sequential linear quadratic algorithm for solving unconstrained nonlinear optimal control problems, February 2005. |
[21] | Farbod Farshidian, Edo Jelavic, Asutosh Satapathy, Markus Giftthaler, and Jonas Buchli. Real-time motion planning of legged robots: A model predictive control approach. In 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 577--584. IEEE, 2017. |
[22] | Y. Tassa, T. Erez, and E. Todorov. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 4906--4913. IEEE, 2012. |
Previous Chapter | Table of contents | Next Chapter |