Uzawa interpretation of Bregman iteration - Adam Oberman Math

Uzawa interpretation of Bregman iteration

In this note we describe a solution method for the following problem.

The L1 regularized equation for sparse solution of linear equations:

  \min \| x \|_1 s.j. A x= b

The Bregman iteration can be found in, for example


The problem is replaced by a regularized problem

  \min \mu \| x \|_1 + \frac 1 2 \|A x - b\|_2^2

While the method discussed here is not the fastest, it is popular, and it is written in terms of iterating a sequence of basic steps, so the method can be written in a few lines.


The Bregman iteration

The iteration is on u,p, begins with both initialized to zero. Then

uk + 1 = shrink(Atp,1)
pk + 1 = pk − (bAuk + 1)

The shrink operator is defined below. The idea is that the iteration converges to a minimizer of the original problem (or one nearby).

The Augmented Lagrangian

First replace the problem by a regularized version

  \min \| x \|_1 + \epsilon \| x\|_2^2, \quad \text{ subject to } A x= b

Then consider the Lagrangian

  L(x,\lambda) = \min \| x \|_1 + \epsilon \| x\|_2^2 + \lambda (b - A x)

The dual function is obtained by minimizing over x for fixed λ

  g(\lambda) = \min_x L(x,\lambda) = \min_x \| x \|_1 + \epsilon \| x\|_2^2 + \lambda (b - A x)

which upon completing the square gives

  g(\lambda) =  \min_x   \| x \|_1 + \epsilon \| x - A^t \lambda/(2\epsilon) \|_2^2  + \lambda  b - \frac 1 {4 \epsilon} \| A^t \lambda\|^2

Notice that the problem is separable in each coordinate. The minimizer is

x = shrink(Atλ / (2ε),2ε) = shrink(Atλ,1)

Now for a given x, performing gradient descent method on the dual results in

λ = λ − (bAx)

So the Bregman iteration arises as a descent method on the Augmented Lagrangian, for the dual problem.

Definition of Shrinkage

The shrink operator, which is the coordinate operator

shrink(b,\delta) = 
\begin{cases}
0 & |b| < \delta\\
b - sgn(b) \delta & o.w.
\end{cases}

The shrink operator can also be written in terms of the projection operator onto the interval

skrink(b,δ) = bproj[ − δ,δ](b)

The shrink operator is the minimizer of the one dimensional problem

  =  \min_x   \delta \| x \|_1 + \frac 1 2  \| x - b \|_2^2

See Homework from Convex Optimization