Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

© Russ Tedrake, 2024
Last modified .
How to cite these notes, use annotations, and give feedback.

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2024 semester. Lecture videos are available on YouTube.

Previous Chapter Table of contents Next Chapter

Robust and Stochastic Control

My goal of presenting a relatively consumable survey of a few of the main ideas is perhaps more important in this chapter than any other. It's been said that "robust control is encrypted" (as in you need to know the secret code to get in). The culture in the robust control community has been to leverage high-powered mathematics, sometimes at the cost of offering more simple explanations. This is unfortunate, I think, because robotics and machine learning would benefit from a richer connection to these tools, and are perhaps destined to reinvent many of them.

The classic reference for robust control is Zhou97. Petersen00 has a treatment that does more of its development in the time-domain and via Riccati equations; I will do the same here.

Stochastic models

So far in the notes, we have concerned ourselves primarily with known, deterministic systems. In the stochastic systems chapter, we started our study of nonlinear dynamics of stochastic systems, which can be beautiful! In this chapter we will begin to consider computational tools for analysis and control of those systems. Stochasticity can come in many forms... we may not know the governing equations (e.g. the coefficient of friction in the joints), our robot may be walking on unknown terrain, subject to unknown disturbances, or even be picking up unknown objects. There are a number of mathematical frameworks for considering this uncertainty; for our purposes this chapter will generalizing our thinking to equations of the form: $$\dot\bx = {\bf f}(\bx, \bu, \bw, t) \qquad \text{or} \qquad \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n], n),$$ where $\bw$ is a new random input signal to the equations capturing all of this potential variability. Although it is certainly possible to work in continuous time, and treat $\bw(t)$ as a continuous-time random signal (c.f. Wiener process), the notation and intuition is a bit simpler when we work with $\bw[n]$ as a discrete-time random signal. For this reason, we'll devote our attention in this chapter to the discrete-time systems.

In order to simulate equations of this form, or to design controllers against them, we need to define the random process that generates $\bw[n]$. It is typical to assume the values $\bw[n]$ are independent and identically distributed (i.i.d.), meaning that $\bw[i]$ and $\bw[j]$ are uncorrelated when $i \neq j$. As a result, we typically define our distribution via a probability density $\bw[n] \sim p_\bw(\bw)$The notation $\bw[n] \sim p_\bw(\bw)$ denotes that $\bw[n]$ is "drawn from" the distribution $p_\bw(\bw).$. This is not as limiting as it may sound... if we wish to treat temporally-correlated noise (e.g. "colored noise") the format of our equations is rich enough to support this by adding additional state variables; we'll give an example below of a "whitening filter" for modeling wind gusts. The other source of randomness that we will now consider in the equations is randomness in the initial conditions; we will similarly define a probability density $\bx[0] \sim p_0(\bx).$

This modeling framework is rich enough for us to convey the key ideas; but it is not quite sufficient for all of the systems I am interested in. In we go to additional lengths to support more general cases of stochastic systems. This includes modeling system parameters that are drawn from random each time the model is initialized, but are fixed over the duration of a simulation; it is possible but inefficient to model these as additional state variables that have no dynamics. In other problems, even the dimension of the state vector may change in different realizations of the problem! Consider, for instance, the case of a robot manipulating random numbers of dishes in a sink. I do not know many control formulations that handle this type of randomness well, and I consider this a top priority to think more about! (We'll begin to address it in the output feedback chapter.)

Costs and constraints for stochastic systems

Given a stochastic model, what sort of cost function should we write in order to capture the desired aspect of performance? There are a number of natural choices, which provide nice trade-offs between capturing the desired phenomena and computational tractability.

Airplane flying through the canyon figure?

Remember that $\bx[n]$ is now a random variable, so we want to write our cost and constraints using the distribution described e.g. $p_n(\bx)$. Broadly speaking, one might specify a cost in a handful of ways:

Similarly, we can think about how to generalize our deterministic constraints, e.g. $g(\bx) \le 0,$ into a the stochastic framework:

The term "robust control" is typically associated with the class of techniques that try to guarantee some worst-case performance or a worst-case bound (e.g. the gain bounds). The term "stochastic optimal control" or "stochastic control" is typically used as a catch-all for other methods that reason about stochasticity, without necessarily providing the strict robustness guarantees.

It's important to realize that proponents of robust control are not necessarily pessimistic by nature; there is a philosophical stance about how difficult it can be to model the true distribution of randomness that a control system will face in the world. Worst-case analysis typically requires only a definition of the set of possible values that the random variable can take, and not a detailed distribution over those possible values. It may be more practical to operationalize a concept like "this UAV will not crash so long as the peak wind gusts are below 35 mph" than requiring a detailed distribution over wind gust probabilities. But this simpler specification does come at some cost -- worst-case certificates are often pessimistic about the true performance of a system, and optimizing worst-case performance can lead to conservatism in the control design.

Finite Markov Decision Processes

We already had quick preview into stochastic optimal control in one of the cases where it is particularly easy: finite Markov Decision Processes (MDPs). And we have seen that finite-state approximations can give us insights into even very complex nonlinear dynamics. Let's make sure that we understand how to do control design again the various cost and constraint formulations we've proposed above...

Discounted cost. Infinite-horizon average cost. Perhaps a example of something other than expected-value cost (worst case, e.g. Katie's metastability?)

Linear optimal control

Stochastic LQR

Let's consider a stochastic extension of the (discrete-time) LQR problem, where the system is now subjected to additive Gaussian white noise: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n],\\ E\left[\bw[i]\right] = 0, \quad E\left[ \bw[i]\bw^T[j] \right] = \delta_{ij}{\bf \Sigma_w},\end{gather*} where $\delta_{ij}$ is one if $i=j$ and zero otherwise, and ${\bf \Sigma_w}$ is the covariance matrix of the disturbance, and we take the average cost: $$\min E\left[ \sum_{n=0}^\infty \bx^T[n]\bQ\bx[n] + \bu^T[n] \bR\bu[n] \right], \qquad \bQ=\bQ^T \succeq 0, \bR = \bR^T \succ 0.$$ Note that we saw one version of this already when we discussed policy search for LQR.

In the standard LQR derivation, we started by assuming a quadratic form for the cost-to-go function. Is that a suitable starting place for this stochastic version? We've already studied the dynamics of a linear system subjected to Gaussian noise, and learned that (at best) we should expect it to have a Gaussian stationary distribution. But that sounds problematic, no? If the system can not drive the system to zero and stay at zero in order to stop accruing cost, then won't the infinite-horizon cost be infinite (regardless of the controller)?

Yes. The cost-to-go for this problem is infinite! But it turns out that it still has enough structure for us to work with. In particular, let's "guess" a cost-to-go function of the form: $$J_n(\bx) = \bx^T {\bf S}_n \bx + c_n,$$ where $c_n$ is a (scalar) constant. Now we can write the Bellman equation and do some algebra: \begin{align*} J_n(\bx) &= \min_\bu E_\bw\left[\bx^T \bQ \bx + \bu^T\bR\bu + [\bA\bx + \bB\bu + \bw]^T{\bf S}_{n+1}[\bA\bx + \bB\bu + \bw] + c_{n+1}\right] \\ & \begin{split}= \min_\bu &[\bx^T \bQ \bx + \bu^T\bR\bu + [\bA\bx + \bB\bu]^T{\bf S}_{n+1}[\bA\bx + \bB\bu] + c_{n+1} \\ &+ E_\bw\left[ [\bA\bx + \bB\bu]^T{\bf S}_{n+1} \bw + \bw^T{\bf S}_{n+1}[\bA\bx + \bB\bu]\right] + E_\bw[\bw^T \bw]]\end{split} \\ &= \min_\bu [\bx^T \bQ \bx + \bu^T\bR\bu + [\bA\bx + \bB\bu]^T{\bf S}_{n+1}[\bA\bx + \bB\bu] + c_{n+1} + \tr({\bf \Sigma_w})]. \end{align*} The second line follows by simply pulling all of the deterministic terms outside of the expected value. The third line follows by observing that $\bx$ and $\bu$ are uncorrelated with $\bw,$ and $E[\bw] = 0$, so those cross terms are zero in expectation. Notice that, apart from the $c_{n+1}$ and ${\bf \Sigma_w}$, the remaining terms are exactly the same as the deterministic (discrete-time) LQR version.

Remarkably, we can therefore achieve our dynamic programming recursion by using ${\bf S}_n$ as the solution of the discrete Riccati equation, and $c_n = c_{n+1} + \tr({\bf \Sigma_w}).$ As we take $n\rightarrow \infty$, ${\bf S}_n$ converges to the steady-state solution to the algebraic Riccati equation, and only $c_n$ grows to infinity. As a result, even though the cost-to-go is infinite, the optimal control is still well defined: it is the same $\bu^* = -{\bf K}\bx$ that we obtain from the deterministic LQR problem!

Take note of the fact that the true cost-to-go blows up to infinity. In reinforcement learning, for instance, it is common practice to avoid this blow-up by considering discounted-cost formulations, $$\sum_{n=0}^\infty \gamma^n \ell(\bx[n], \bu[n]),\quad 0 < \gamma \le 1,$$ or average-cost formulations, $$\lim_{N\rightarrow \infty} \frac{1}{N} \sum_{n=0}^N \ell(\bx[n], \bu[n]).$$ These are satisfactory solutions to the problem, but please make sure to understand why they must be used.

Non-i.i.d. disturbances

The LQR derivation above assumed that the disturbances $\bw[n]$ were independent and identically distributed (i.i.d.). But many of the disturbances we would like to model are not i.i.d.. For instance, consider a UAV flying in the wind. The wind is correlated over time, sometimes building up to gusts but even those gusts are relatively long compared to any the sampling rate of a control system.

In fact, the standard models of wind are typically the output of a Gaussian i.i.d. random signal passed through a linear low-pass filter Moore14b. This suggests a very natural approach to using LQR to stabilize a UAV in the wind: design the controller for the joint wind + UAV system. The state of this system contains both the state of the UAV and the state of the wind, so the resulting controller will have the form $$\bu = -{\bf K} \begin{bmatrix}\bx_{uav} \\ \bx_{wind} \end{bmatrix}.$$ In practice, this means that the controller will need to rely on an online estimator for the wind state in addition to the standard (UAV) state estimator.

definitely need a diagram here.

A quadrotor flying in the wind.

I've adapted our simple 2D quadrotor model to have an (optional) additional input port for Cartesian force disturbances. For the wind, I've added a RandomSource to generate the output of iid Gaussain random process, multiplied that by a scalar Gain to scale the covariance, and then put it through a FirstOrderLowPassFilter to make the colored noise wind model.

In the notebook, I've implemented LQR both ways: first the traditional LQR on the quadrotor system only (ignoring the wind), and second on the diagram that includes the Gain and LowPassFilter systems in addition to the quadrotor.

Stochastic linear MPC

Coming soon...

(conservative) chance constraints. e.g. lars blackmore. Maybe Guy's ellipsoidal slice sampling Scher22.

Worst-case control w/ bounded uncertainty

Common Lyapunov functions

We've already seen one nice example of robustness analysis for linear systems when we wrote a small optimization to find a common Lyapunov function for uncertain linear systems. That example studied the dynamics $\bx[n+1] = \bA \bx[n]$ where the coefficients of $\bA$ were unknown but bounded.

We also saw that essentially the same technique can be used to certify stability against disturbances, e.g.: $$\bx[n+1] = \bA\bx[n] + \bw[n], \qquad \bw[n] \in \mathcal{W},$$ where $\mathcal{W}$ describes some bounded set of possible uncertainties. In order to be compatible with convex optimization, we often choose to describe $\mathcal{W}$ as either an ellipsoid or as a convex polytopeBoyd04a. But let's remember a very important point: in this situation with additive disturbances, we can no longer expect the system to converge stably to the origin. Our example before used the common Lyapunov function only to certify the invariance of a region (if I start inside some region, then I'll never leave).

Polytope dynamics

A polytope is a bounded polyhedron. Let's take a moment to understand how a polytope propagates through a linear dynamical system. There are a number of possible representations for a polytope. The most common the so-called "H-rep", or half-space representations, which is described by the linear constraints ${\bf H}\bx \le {\bf h}$, or the "V-rep", or vertex representation, which is described by the convex hull of a set of vertices. For optimization, we often also study "zonotopes", which are defined as the affine projection of a unit cube, or the more general "AH-polytopes" which are the affine projection of an H-polytope Sadraddini19a+Sadraddini21. In Drake we have a library of these convex sets with many of the most common operations implemented and made compatible with MathematicalProgram.

Let's develop our notation for H-polytope and AH-polytopes here. We'll define a polytope, $\mathbb{P} \subset \Re^n,$ as an H-polytope using $$\mathbb{P} = \mathcal{H}({\bf H}, {\bf h}) := \{\bx \in \Re^n | {\bf H} \bx \le {\bf h} \}.$$ For an AH-polytope, we write $$\mathbb{X} := \mathcal{AH}({\bf T}, {\bf t}, \mathbb{P}) := \{{\bf T}\bx + {\bf t} \in \Re^n | \bx \in \mathbb{P} \subset \Re^m \},$$ where ${\bf T} \in \Re^{n \times m}$ and ${\bf t} \in \Re^n$ define the affine transform. Note that AH-polytope can be the projection of an H-polytope with a different ambient dimension.

AH-polytopes are closed under many of the essential operations we need for linear dynamical systemsSadraddini21. Using set notation, this includes (further) affine transformations: $$\bA \mathbb{X} + {\bf b} = \mathcal{AH}(\bA {\bf T}, \bA{\bf t} + {\bf b}, \mathbb{P}),$$ and the Minkowski sum. Given two AH-polytopes, $\mathbb{X}$ and $\mathbb{Y}$ both in $\Re^n$, their Minkowski sum is given by: $$\mathbb{X} \oplus \mathbb{Y} = \mathcal{AH}\left([{\bf T}_x, {\bf T}_y], {\bf t}_x + {\bf t}_y, \mathcal{H}\left(\begin{bmatrix} {\bf H}_x & 0 \\ 0 & {\bf H}_y \end{bmatrix}, \begin{bmatrix} {\bf h}_x \\ {\bf h}_y \end{bmatrix}\right)\right).$$

We can actually use this set notation now directly to write the dynamics of a polytope (over state) as it propagates through an uncertain linear dynamical system: $$\bx[n+1] = \bA \bx[n] + \bw[n], \quad \bx[0] \in \mathbb{X}_0, \bw[n] \in \mathbb{W}.$$ The result is $$\mathbb{X}_{n+1} = \bA \mathbb{X}_n \oplus \mathbb{W}.$$ Beautiful! But please notice one important detail: the description size of the polytope (via the ambient dimension of the underlying H-polytope) is growing on each time step, due to the Minkowski sum with $\mathbb{W}$. The sets can get complicated for long-horizon rollouts.

van der pol oscillator example, where the nonlinearities are bounded? or where I simply linearize?

Robust MPC

Although I've written out polytopic reachability for a closed-loop linear system (+ noise) so far, one can similarly propagate the polytopic dynamics under a feedback control of the form \begin{gather*}\bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n], \\ \mathbb{X}_{n+1} = (\bA\mathbb{X}_n + \bB\bu_n) \oplus \mathbb{W}.\end{gather*} This leads to a form of robust model-predictive control commonly referred to as "TubeMPC" Mayne05. One can optimize $\bu_n$ and the descriptions of $\mathbb{X}_n$ using convex optimization subject to these linear dynamics. Furthermore, this can be combined with disturbance-based feedback parameterizations to simulataneously search for time-varying feedback gains, $\bK_n$ Sadraddini20.

Tube MPC

Coming soon...

Polytopic containment

One objective we might like to specify in our optimization is that the entire reachable set at the final time is inside a goal set, e.g. $\mathbb{X}_N \subseteq \mathbb{X}_{goal}.$ Checking this condition is a "polytopic containment" problem -- a classic problem in computational geometry. But we want to be able to optimize the form of $\mathbb{X}_N$ (via optimizing the sequence of control decisions), so we want to be able to embed the polytopic containment problem as just an additional set of constraints in our MPC problem.

Sadraddini19a gave an efficient encoding of the polytopic containment problem using only linear constraints that is sufficient to certify the containment, and Sadraddini21 gave an additional condition which is both necessary and sufficient, but requires potentially an exponential number of linear constraints. The specific equations are slightly messy, so I've omitted them here, but in these linear constraints, the terms ${\bf T}_x, {\bf t}_x, {\bf t}_\hat{x},$ and ${\bf h}_\hat{x}$ also appear linearly and so can be optimized simultaneously. Unfortunately, the terms ${\bf T}_\hat{x}, {\bf H}_x, {\bf h}_x, $ and $\hat{\bf H}_\hat{x}$ appear bilinearly with the containment decision variables, so we typically treat them as fixed.

Polytopic containment

Coming soon...

If one is simply performing reachability analysis (not joint with control design), then polytopic containment can also be used on each time step to limit the growth of the description length of the polytopes. We can attempt to replace $\mathbb{X}_n$ with a minimal volume polytope, $\hat{\mathbb{X}}_n$ with a fixed number of faces, subject to $\mathbb{X}_n \subseteq \hat{\mathbb{X}}_n.$ This, of course, comes at the cost of conservatism (larger reachable sets).

TubeMPC with a goal polytope

Coming soon...

Robust constrained LQR

Coming soon...

Disturbance-based feedback parameterizations

Although it would be tempting to parameterize the feedback gains $\bK_n$ as decision variables in our Robust MPC framework, searching for both $\bK_n$ and $\bx_n$ in the traditional parameterization of $\bK_n\bx_n$ would result in terms that are bilinear (and therefor nonconvex) in the decision variables. The disturbance-based feedback parameterizations get around this. This is an old but important idea which was made famous first as the Youla parameterization (alternatively "Q-parameterization"). In the time domain this typically leads to controllers which are "unrolled in time" and depend on a potentially infinite history of disturbances; common practice it to approximate these with a finite-impulse response (FIR) truncation. One could imagine extracting a state-space realization of these FIR responses using the techniques from linear system identification Anderson17.

We can understand the essence of this idea with a simple extension of the LQR with least-squares derivation... (it's a work in progress!)

Given the state space equations: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n],\end{gather*} Consider parameterizing an output feedback policy of the form $$\bu[n] = \bK_0[n] \bx_0 + \sum_{i=1}^{n-1}\bK_i[n]\bw[n-i],$$ then the closed-loop state is convex in the control parameters, $\bK$: \begin{align*}\bx[n] =& \left( {\bf A}^n + \sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_0[i] \right) \bx_0 + \sum_{j=0}^{n-1} \sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_{j}[i] \bw[i-j],\end{align*} and therefore objectives that are convex in $\bx$ and $\bu$ (like LQR) are also convex in $\bK$. Moreover, we can calculate $\bw[n]$ by the time that it is needed given our observations of $\bx[n+1], \bx[n],$ and knowledge of $\bu[n].$

SLS

$L_2$ gain

By limiting $\bw$ to be drawn from a bounded (polytopic) set, we were able to perform efficient reachability analysis. But knowing that the dependency on $\bw$ is linear allows us to do something potentially even more natural, which can lead to tighter bounds. In particular, we expect the magnitude of the deviation in $\bx$ compared to the undisturbed case to be proportional to the magnitude of the disturbance, $\bw$. So a perhaps more natural bound for a linear system is a relative bound on the magnitude of the response (from zero initial conditions) relative to the magnitude of the disturbance.

Typically, this is done with the a scalar "$L_2$ gain", $\gamma$, defined as: \begin{align*}\argmin_\gamma \quad \subjto& \quad \sup_{\bw(\cdot) \in \int \|\bw(t)\|^2 dt\le \infty} \frac{\int_0^T \| \bx(t) \|^2 dt}{\int_0^T \| \bw(t) \|^2dt} \le \gamma^2, \qquad \text{or} \\ \argmin_\gamma \quad \subjto& \sup_{\bw[\cdot] \in \sum_n \|\bw[n]\|^2 \le \infty} \frac{\sum_0^N \|\bx[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2.\end{align*} The name "$L_2$ gain" comes from the use of the $\ell_2$ norm on the signals $\bw(t)$ and $\bx(t)$, which is assumed only to be finite.

More often, these gains are written not in terms of $\bx[n]$ directly, but in terms of some "performance output", $\bz[n]$. For instance, if would would like to bound the cost of a quadratic regulator objective as a function of the magnitude of the disturbance, we can minimize $$ \min_\gamma \quad \subjto \quad \sup_{\bw[n]} \frac{\sum_0^N \|\bz[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2, \qquad \bz[n] = \begin{bmatrix}\sqrt{\bQ} \bx[n] \\ \sqrt{\bR} \bu[n]\end{bmatrix}.$$ This is a simple but important idea, and understanding it is the key to understanding the language around robust control. In particular the $\mathcal{H}_2$ norm of a system (from input $\bw$ to output $\bz$) is the energy of the impulse response; when $\bz$ is chosen to represent the quadratic regulator cost as above, it corresponds to the expected LQR cost. The $\mathcal{H}_\infty$ norm of a system (from $\bw$ to $\bz$) is the largest singular value of the transfer function; it corresponds to the $L_2$ gain.

Dissipation inequalities

One of the mechanisms for certifying an $L_2$ gain for a system comes from a generalization of Lyapunov analysis to consider the contributions of system inputs via the so-called "dissipation inequalities". Dissipation inequalities are a general tool, and $L_2$-gain analysis is only one application of them; for a broader treatment see Ebenbauer09 or Scherer15.

Informally, the idea is to generalize the Lyapunov conditions, $V(\bx) \succ 0, \dot{V}(\bx) \preceq 0,$ into the more general form $$V(\bx) \succ 0, \quad \dot{V}(\bx) \preceq s(\bx, \bw),$$ where $\bw$ is the input of interest in this setting, and $s()$ is a scalar quantity representing a "supply rate". Once a system has input, the value of $V$ may go up or it may go down, but if we can bound the way that it goes up by a simple function of the input, then we may still be able to provide input-to-state or input-output bounds for the system. Integrating both sides of the derivative condition with respect to time yields: $$\forall \bx(0),\quad V(\bx(T)) \le V(\bx(0)) + \int_0^T s(\bx(t), \bw(t))dt.$$

To obtain a bound on the $L_2$ gain between input $\bw(t)$ and output $\bz(t)$, the supply rate of interest is $$s(\bx,\bw) = \gamma^2 \|\bw\|^2 - \|\bz\|^2,$$ which yields $$\forall \bx(0),\quad V(\bx(T)) \le V(\bx(0)) + \int_0^T \gamma^2 \|\bw(t)\|^2dt - \int_0^T \|\bz(t)\|^2dt .$$ Now, since this must hold for all $\bx(0)$, it holds for $\bx(0) = 0$. Furthermore, we know $V(\bx(T))$ is non-negative, so we also have $$0 \le V(\bx(T)) \le \int_0^T \gamma^2 \|\bw(t)\|^2dt - \int_0^T \|\bz(t)\|^2dt .$$ Therefore, if we can find a $V$ that satisfied the dissipation inequality for this storage function, we have certified the $\gamma$ is an $L_2$ gain for the system: $$ \frac{\int_0^T \| \bz(t) \|^2 dt}{\int_0^T \| \bw(t) \|^2dt} \le \gamma^2.$$

Small-gain theorem

Robust control diagram

Coming soon...

I like the statement of the basic result from Sasha: http://web.mit.edu/6.241/ameg_www_fall2006/www/images/L06smallgain.pdf

Model uncertainty as a special case.

Diagram in my notes (e.g. from 2022). In the special case of w being the output of a Delta block (which is itself a function of the output z), and that uncertainty block has bounded L2 gain, then we can recover deterministic stability.
IQCs Steinhardt11a

Robust LQR as $\mathcal{H}_\infty$

Let's consider a robust variant of the LQR problem: \begin{gather*} \min_{\bu[\cdot]} \max_{\bw[\cdot]} \sum_{n=0}^\infty \bx^T[n]\bQ\bx[n] + \bu^T[n] \bR\bu[n] - \gamma^2 \bw^T[n]\bw[n],\\ \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bB_\bw \bw[n],\\ \bQ=\bQ^T \succeq 0, \bR = \bR^T \succ 0. \end{gather*} The reason for this choice of cost function will become clear in the derivation, but the intuition is that we want to reward the controller for having a small response to large $\bw[\cdot]$. Note that unlike the stochastic LQR formulation, here we do not specify the distribution over $\bw[n]$, and in fact we don't even restrict it to a bounded set. All we know is that at each time step, an omniscient adversary is allowed to choose the $\bw[n]$ that tries to maximize this objective.

In fact, since we don't need to specify a continuous-time random process and the continuous-time derivation is both cleaner and by now, I think, more familiar, let's do this one in continuous time. \begin{gather*} \min_{\bu[\cdot]} \max_{\bw[\cdot]} \int_{n=0}^\infty dt \left[\bx^T(t)\bQ\bx(t) + \bu^T(t) \bR\bu(t) - \gamma^2 \bw^T(t)\bw(t)\right],\\ \dot\bx(t) = \bA\bx(t) + \bB\bu(t) + \bB_\bw \bw(t),\\ \bQ=\bQ^T \succeq 0, \bR = \bR^T \succ 0. \end{gather*} We will once again guess a quadratic cost-to-go function: $$J(\bx) = \bx^T {\bf S} \bx, \quad {\bf S} = {\bf S}^T \succ 0.$$ The dynamic programming recursion still holds for this problem , resulting in the Bellman equation: $$\forall \bx,\quad 0 = \min_\bu \max_\bw \left[\bx^T\bQ\bx + \bu^T\bR\bu - \gamma^2 \bw^T\bw + \pd{J^*}{\bx}[\bA\bx + \bB\bu + \bB_\bw \bw]\right].$$ Since the inside is a concave quadratic form over $\bw$, we can solve for the adversary by finding the maximum: \begin{gather*} \pd{}{\bw} = -2 \gamma^2 \bw^T + 2\bx^T {\bf S} \bB_\bw,\\ \bw^* = \frac{1}{\gamma^2} \bB_\bw^T {\bf S} \bx.\end{gather*} This is the disturbance input that can cause the largest cost; the optimal play for the adversary. Now we can solve the remainder as we did with the original LQR problem: \begin{gather*} \pd{}{\bu} = 2 \bu^T \bR + 2\bx^T {\bf S} \bB,\\ \bu^* = -\bR^{-1} \bB^T {\bf S} \bx = -\bK \bx.\end{gather*} Substituting this back into the Bellman equation gives: $$0 = \bQ + {\bf S}\left[ \gamma^{-2} \bB_\bw \bB_\bw^T - \bB \bR^{-1} \bB^T \right]{\bf S} + {\bf S}^T \bA + \bA^T {\bf S},$$ which is the original LQR Riccati equation with one additional term involving $\gamma.$ And like the original LQR Riccati equation, we must ask whether it has a positive-definite solution for ${\bf S}$. It turns out that if the system is stabilizable and $\gamma$ large enough, then it does have a PD solution. However, as we reduce $\gamma$ towards zero, there will be some threshold $\gamma$ beneath which this Riccati equation does not have a PD solution. Intuitively, if $\gamma$ is too small, then the adversary is rewarded for injecting disturbances that are so large as to break the convergence.

Here is the fascinating thing: the $\gamma$ in this robust LQR cost function can be interpreted as an $L_2$ gain for the system. Recall that when we were making connections between the Bellman equation and Lyapunov equations, we observed that the Bellman equation can be written as $\forall \bx, \quad \dot{J}^*(\bx) \le -\ell(\bx,\bu)$? Here this yields: $$\forall \bx, \quad \dot{J}^*(\bx) \le \gamma^2 \bw^T(t)\bw(t) - \bx^T(t)\bQ\bx(t) - \bu^T(t) \bR\bu(t).$$ This means that the cost-to-go is a valid dissipation inequality for the supply rate that provides an $L_2$ gain for the performance output $\bz = \begin{bmatrix}\sqrt{\bQ} \bx \\ \sqrt{\bR} \bu\end{bmatrix}.$ Moreover, we can find the minimal $L_2$ gain by finding the minimal $\gamma > 0$ for which the Riccati equation has a positive-definite solution. And given the properties of the Riccati equation, this can be done with a line search in $\gamma$.

Because the $L_2$ gain is the $\mathcal{H}_\infty$-norm of the system, this recipe of searching for the smallest $\gamma$ and taking the Riccati solution is an instance of $\mathcal{H}_\infty$ control design.

Loftberg 2003 for constrained version (w/ disturbance feedback).

Linear Exponential-Quadratic Gaussian (LEQG)

Jacobson73 observed that it is also straight-forward to minimize the objective: $$J = E\left[ \prod_{n=0}^\infty e^{\bx^T[n] {\bf Q} \bx[n]} e^{\bu^T[n] {\bf R} \bu[n]} \right] = E\left[ e^{\sum_{n=0}^\infty \bx^T[n] {\bf Q} \bx[n] + \bu^T[n] {\bf R} \bu[n]} \right],$$ with ${\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0,$ by observing that the cost is monotonically related to $\log{J}$, and therefore has the same minima (this same trick forms the basis for "geometric programming" Boyd07). This is known as the "Linear Exponential-Quadratic Gaussian" (LEG or LEQG), and for the deterministic version of the problem (no process nor measurement noise) the solution is identical to the LQR problem; it adds no new modeling power. But with noise, the LEQG optimal controllers are different from the LQG controllers; they depend explicitly on the covariance of the noise. introduce the coefficient \sigma here, instead of just throwing it in from the start. The coefficient $\sigma$ in the objective is referred to as the "risk-sensitivity parameter" Whittle90.

Insert simplest form of the derivation here, plus an example.

Whittle90 provides an extensive treatment of this framework; nearly all of the analysis from LQR/LQG (including Riccati equations, Hamiltonian formulations, etc) have analogs for their versions with exponential cost, and he argues that LQG and H-infinity control can (should?) be understood as special cases of this approach.

Adaptive control

The standard criticism of $\mathcal{H}_2$ optimal control is that minimizing the expected value does not allow any guarantees on performance. The standard criticism of $\mathcal{H}_\infty$ optimal control is that it concerns itself with the worst case, and may therefore be conservative, especially because distributions over possible disturbances chosen a priori may be unnecessarily conservative. One might hope that we could get some of this performance back if we are able to update our models of uncertainty online, adapting to the statistics of the disturbances we actually receive. This is one of the goals of adaptive control.

One of the fundamental problems in online adaptive control is the trade-off between exploration and exploitation. Some inputs might drive the system to build more accurate models of the dynamics / uncertainty quickly, which could lead to better performance. But how can we formalize this trade-off?

There has been some nice progress on this challenge in machine learning in the setting of (contextual) multi-armed bandit problems. For our purposes, you can think of bandits as a limiting case of an optimal control problem where there are no dynamics (the effects of one control action do not effect the results of the next control action). In this simpler setting, the online optimization community has developed exploration-exploitation strategies based on the notion of minimizing regret -- typically the accumulated difference in the performance achieved by my online algorithm vs the performance that would have been achieved if I had been privy to the data from my experiments before I started. This has led to methods that make use of concepts like upper-confidence bound (UCB) and more recently bounds using a least-squares squares confidence bound Foster20 to provide bounds on the regret.

In the last few years, we've see these results translated into the setting of linear optimal control...

finish this... cite Elad / Max

Structured uncertainty

$\mu$ synthesis. D-K iterations.

Linear parameter-varying (LPV) control

Pendulum example from Briat15 1.4.1 (and the references therein).

Trajectory optimization

Monte-carlo trajectory optimization

Bregman divergence ADMM in guided policy search from Sergey? Abhi recommended this paper for it: https://arxiv.org/abs/1504.00702 Levine16

Iterative $\mathcal{H}_2$/iLQG

cover iLQR and perhaps Scott's UKF version in the output-feedback chapter?
iLEQG/iLEG.

Nonlinear analysis and control

Stochastic verification of nonlinear systems. (Having bounds on expected value does yield bounds on system state). Uncertainty quantification (e.g., linear guassian approximation; discretize then closed form solutions using the transition matrix....). Monte-Carlo. Particle sim/filter. Rare event simulation

L2-gain with dissipation inequalities. Finite-time verification with sums of squares.

Occupation Measures

SOS-based design

Domain randomization

and empirical risk?

Extensions

Alternative risk/robustness metrics

e.g. Ani + Pavone

References

  1. Kemin Zhou and John C. Doyle, "Essentials of Robust Control", Prentice Hall , 1997.

  2. I. R. Petersen and V. A. Ugrinovskii and A. V. Savkin, "Robust Control Design using H-infinity Methods", Springer-Verlag , 2000.

  3. Anirudha Majumdar and Marco Pavone, "How should a robot assess risk? towards an axiomatic theory of risk in robotics", Robotics Research , pp. 75--84, 2020.

  4. Guy Scher and Sadra Sadraddini and Russ Tedrake and Hadas Kress-Gazit, "Elliptical Slice Sampling for Probabilistic Verification of Stochastic Systems with Signal Temporal Logic Specifications", To appear in the Proceedings of Hybrid Systems: Computation and Control (HSCC) , 2022. [ link ]

  5. Jacob Steinhardt and Russ Tedrake, "Finite-time Regional Verification of Stochastic Nonlinear Systems", International Journal of Robotics Research, vol. 31, no. 7, pp. 901-923, June, 2012. [ link ]

  6. Joseph Moore, "Robust Post-Stall Perching with a Fixed-Wing UAV", PhD thesis, Massachusetts Institute of Technology, September, 2014. [ link ]

  7. Stephen Boyd and Lieven Vandenberghe, "Convex Optimization", Cambridge University Press , 2004.

  8. Sadra Sadraddini and Russ Tedrake, "Linear Encodings for Polytope Containment Problems", Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC) , pp. 8, 2019. [ link ]

  9. Sadra Sadraddini and Russ Tedrake, "Linear Encodings for Polytope Containment Problems: Theoretical Foundations and Applications", , Feb, 2021.

  10. D.Q. Mayne and M.M. Seron and SV Rakovic, "Robust model predictive control of constrained linear systems with bounded disturbances", Automatica, vol. 41, no. 2, pp. 219--224, 2005.

  11. Sadra Sadraddini and Russ Tedrake, "Robust Output Feedback Control with Guaranteed Constraint Satisfaction", In the Proceedings of 23rd ACM International Conference on Hybrid Systems: Computation and Control , pp. 12, April, 2020. [ link ]

  12. James Anderson and Nikolai Matni, "Structured state space realizations for SLS distributed controllers", 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pp. 982-987, 2017.

  13. Christian Ebenbauer and Tobias Raff and Frank Allgower, "Dissipation inequalities in systems theory: An introduction and recent results", Invited Lectures of the International Congress on Industrial and Applied Mathematics 2007 , pp. 23-42, 2009.

  14. Carsten Scherer and Siep Weiland, "Linear Matrix Inequalities in Control", Online Draft , pp. 293, 2015.

  15. D. Jacobson, "Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games", IEEE Transactions on Automatic Control, vol. 18, no. 2, pp. 124--131, apr, 1973.

  16. S. Boyd and S.-J. Kim and L. Vandenberghe and A. Hassibi, "A Tutorial on Geometric Programming", Optimization and Engineering, vol. 8, no. 1, pp. 67-127, 2007.

  17. Peter Whittle, "Risk-sensitive optimal control", Wiley New York , vol. 20, 1990.

  18. Dylan Foster and Alexander Rakhlin, "Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles", Proceedings of the 37th International Conference on Machine Learning , vol. 119, pp. 3199--3210, 13--18 Jul, 2020.

  19. Sergey Levine and Chelsea Finn and Trevor Darrell and Pieter Abbeel, "End-to-end training of deep visuomotor policies", The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334--1373, 2016.

Previous Chapter Table of contents Next Chapter