Algorithms for Walking, Running, Swimming, Flying, and Manipulation

© Russ Tedrake, 2020

How to cite these notes |
Send me your feedback

**Note:** These are working notes used for a course being taught
at MIT. They will be updated throughout the Spring 2020 semester. Lecture videos are available on YouTube.

Previous Chapter | Table of contents | Next Chapter |

So far in the notes, we have concerned ourselves with only known,
deterministic systems. In this chapter we will begin to consider uncertainty.
This uncertainy can come in many forms... we may not know the governing
equations (e.g. the coefficient of friction in the joints), our robot may be
walking on unknown terrain,
subject to unknown disturbances, or even be picking up unknown objects.
There are a number of mathematical frameworks for considering this
uncertainty; for our purposes this chapter will generalizing our thinking to
equations of the form: $$\dot\bx = {\bf f}(\bx, \bu, \bw, t) \qquad \text{or}
\qquad \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n], n),$$ where $\bw$ is a new
*random* input signal to the equations capturing all of this potential
variability. Although it is certainly possible to work in continuous time, and
treat $\bw(t)$ as a continuous-time random signal (c.f. Wiener process), it is
notationally simpler to work with $\bw[n]$ as a discrete-time random signal.
For this reason, we'll devote our attention in this chapter to the
discrete-time systems.

In order to simulate equations of this form, or to design controllers against them, we need to define the random process that generates $\bw[n]$. It is typical to assume the values $\bw[n]$ are independent and identically distributed (i.i.d.), meaning that $\bw[i]$ and $\bw[j]$ are uncorrelated when $i \neq j$. As a result, we typically define our distribution via a probability density $p_{\bf w}(\bw[n])$. This is not as limiting as it may sound... if we wish to treat temporally-correlated noise (e.g. "colored noise") the format of our equations is rich enough to support this by adding additional state variables; we'll give an example below of a "whitening filter" for modeling wind gusts. The other source of randomness that we will now consider in the equations is randomness in the initial conditions; we will similarly define a probabilty density $p_\bx(\bx[0]).$

This modeling framework is rich enough for us to convey the key ideas; but
it is not quite sufficient for all of the systems I am interested in. In
*dimension of the state vector* may change in different realizations of
the problem! Consider, for instance, the case of a robot manipulating random
numbers of dishes in a sink. I do not know many control formulations that
handle this type of randomness well, and I consider this a top priority to
think more about! (We'll begin to address it in the output feedback chapter.)

Roughly speaking, I will refer to "stochastic control" as the discipline of
synthesizing controllers that govern the probabilistic evolution of the
equations. "Stochastic optimal control" defines a cost function (now a random
variable), and tries to find controllers that optimize some metric such as the
expected cost. When we use the terms "robust control", we are typically
referring to a class of techniques that try to guarantee a worst-case
performance or a worst-case bound on the effect of randomness on the input on
the randomness on the output. Interestingly, for many robust control
formulations we do not attempt to know the precise probability distribution of
$\bx[0]$ and $\bw[n]$, but instead only define the *sets*
of possible values that they can take. This modeling is powerful, but can
lead to conservative controllers and pessimistic estimates of performance.

One of the most amazing features of the dynamic programming, additive cost approach to optimal control is the relative ease with which it extends to optimizing stochastic systems.

For discrete systems, we can generalize our dynamics on a graph by adding action-dependent transition probabilities to the edges. This new dynamical system is known as a Markov Decision Process (MDP), and we write the dynamics completely in terms of the transition probabilities \[ \Pr(s[n+1] = s' | s[n] = s, a[n] = a). \] For discrete systems, this is simply a big lookup table. The cost that we incur for any execution of system is now a random variable, and so we formulate the goal of control as optimizing the expected cost, e.g. \begin{equation} J^*(s[0]) = \min_{a[\cdot]} E \left[ \sum_{n=0}^\infty \ell(s[n],a[n]) \right]. \label{eq:stochastic_dp_optimality_cond} \end{equation} Note that there are many other potential objectives, such as minimizing the worst-case error, but the expected cost is special because it preserves the dynamic programming recursion: \[ J^*(s) = \min_a E \left[ \ell(s,a) + J^*(s')\right] = \min_a \left[ \ell(s,a) + \sum_{s'} \Pr(s'|s,a) J^*(s') \right].\] Remarkably, if we use these optimality conditions to construct our value iteration algorithm \[ \hat{J}(s) \Leftarrow \min_a \left[ \ell(s,a) + \sum_{s'} \Pr(s'|s,a) \hat{J}(s') \right],\] then this algorithm has the same strong convergence guarantees of its counterpart for deterministic systems. And it is essentially no more expensive to compute!

There is a particularly cute observation to be made here. Let's assume that we have discrete control inputs and discrete-time dynamics, but a continuous state space. Recall the fitted value iteration on a mesh algorithm described above. In fact, the resulting update is exactly the same as if we treated the system as a discrete state MDP with $\beta_i$ representing the probability of transitioning to state $x_i$! This sheds some light on the impact of discretization on the solutions -- discretization error here will cause a sort of diffusion corresponding to the probability of spreading across neighboring nodes.

Previous Chapter | Table of contents | Next Chapter |