Intereting Posts

Complex Mean Value Theorem: Counterexamples
nonstandard example of smooth function which fails to be analytic on $\mathbb{R}$
how to prove, $f$ is onto if $f$ is continuous and satisfying $|f(x) – f(y)| ≥ |x – y|$ for all $x,y$ in $\mathbb{R}$
Counterexample for the Open Mapping Theorem
Examples of pairewise independent but not independent continuous random variables
Infinite series expansion of $\sin (x)$
How to write $\aleph$ by hand
What can we say of a group all of whose proper subgroups are abelian?
famous space curves in geometry history?
Recommending books for introductory differential geometry
Which non-Abelian finite groups contain the two specific centralizers? – part II
Topology and analytic functions
A bijective continuous map is a homeomorphism iff it is open, or equivalently, iff it is closed.
If $a,b,c \in R$ are distinct, then $-a^3-b^3-c^3+3abc \neq 0$.
Rigorous Proof: Circle cannot be embedded into the the real line!

I have been using Lagrange multipliers in constrained optimization problems, but I don’t see *how* they actually work to simultaneously satisfy the constraint *and* find the lowest possible value of an objective function.

- Interpretation of a line integral with respect to x or y .
- Evaluating the integral with trigonometric integrand
- Finding a function's domain from the function's formula
- Slope of the tangent line, Calculus
- Monotonic, surjective function
- Why does $\lim_{n\to\infty}(1+\frac{1}{n})^n = e$ instead of $1$?
- How to show that $f(x)=x^2$ is continuous at $x=1$?
- Convergence and limit of a recursive sequence
- Under what conditions can a function $ y: \mathbb{R} \to \mathbb{R} $ be expressed as $ \dfrac{z'}{z} $?
- Lipschitz at every point in $$ but not Lipschitz on $$?

This elaborates on Mr. Bulatov’s answer (note: Mr. Bulatov’s link is broken, but I infer it is supposed to link to Dan Klein’s notes Lagrange Multipliers without Permanent Scarring, which are indeed excellent).

Anyway, for the sake of simplicity let’s work with functions in two variables, i.e. we want to find critical points of $f(x,y)$ where the constraint is $g(x,y)=c$.

Take any point $(a,b)$ in the $xy$-plane such that $g(a,b)=c$. Suppose that there was a (unit) vector $\vec v$ tangent to $g$ at $(a,b)$ whose dot product with the gradient vector of $f$ at $(a,b)$ was non-zero. We can take $\vec v$ such that the dot product with $\vec v$ is positive since $\nabla f(a,b)\cdot \vec v=-(\nabla f(a,b)\cdot -\vec v)$.

Now what this means is that the the partial derivative of $f$ with respect to $\vec v\quad$ is positive and that points along $\vec v\quad$ close enough to $(a,b)$ have higher $f$-value. I (and Mr. Bulatov) claim that this means that there are points on $g(x,y)=c$ close to $(a,b)$ with higher $f$-value, meaning that $f$ is not a maximum. Mr. Bulatov phrases the argument well in terms of infinitesimals, but I feel that a little bit more rigor is illuminating.

Consider a sequence of points $(a_i,b_i)$ on $g(x,y)=c$ that converge to $(a,b)$. Each point $(a_i,b_i)$ comes with a unit vector $\vec u_i$ that corresponds to the direction (is a scalar multiple of) $(a,b) – (a_i,b_i)$. Because $\vec v$ is a tangent vector, we can choose the $(a_i,b_i)$ such that the $\vec u_i$ converge to $\vec v$ . But because $f$ is differentiable in more than one variable, its partial derivatives are continuous, and so we have that the partial derivatives $\nabla f(a,b)\cdot \vec u_i$ converge to $\nabla f(a,b) \cdot \vec v$, and so for all sufficiently large i the partial derivative in the $u_i$ direction is positive and so $f(a_i,b_i)>f(a,b)$.

Therefore, if $f(a,b)$ was a maximum (or a minimum) of $f$ on $g(x,y)=c$, gradient vector of $f$ at $(a,b)$ would be have dot product zero with every vector in the line tangent to $g$ at $(a,b)$. But that means that the gradient vector of $f$ is orthogonal to the tangent line of $g$, which means that the gradient of $f$ must be parallel to (i.e. a scalar multiple of) the gradient of $g$.

From here we obtain the equations $\nabla f(x,y)=\lambda\nabla g(x,y)$ and $g(x,y)=c$, which are satisfied if and only if $(x,y)$ is a minimum of $f(x,y)+\lambda (g(x,y)-c)$ for positive $\lambda$ and a maximum for negative $\lambda$.

This type of problem is generally referred to as constrained optimization. A general technique to solve many of these types of problems is known as the method of Lagrange multipliers, here is an example of such a problem using Lagrange multipliers and a short justification as to why the technique works.

Consider the parabaloid given by $f(x,y) = x^2 + y^2$. The global minimum of this surface lies at the origin (at $x=0$, $y=0$). If we are given the constraint, a requirement on the relationship between $x$ and $y$, that $3x+y=6$, then the origin can no longer be our solution (since $3\cdot 0 + 1 \cdot 0 \neq 6$). Yet, there is a lowest point on this function satisfying the given constraint.

**What we have so far:**

Objective function: $f(x,y) = x^2 + y^2$,

subject to: $3x+y=6$.

From here we can derive the Lagrange formulation of our constrained minimization problem. This will be a function $L$ of $x$, $y$, and a single Lagrange multiplier $\lambda$ (since we have only a single constraint). *It will be this new function that we minimize.*

$L(x,y,\lambda) = x^2 + y^2 + \lambda(3x+y-6)$

The Lagrange formulation incorporates our original function along with our constraint(s). On the way toward minimizing $L$, we will have to minimize the objective function $x^2 + y^2$, as well as minimize the contribution from the constraint, which is now weighted by a factor of $\lambda$. If the constraint is met, then the expression $3x+y-6$ will necessarily be zero, and will not contribute anything the value of $L$. This is the *trick* of the technique.

**Minimizing the Lagrange formulation:**

To minimize $L$ we simply find the $x,y$, and $\lambda$ values that make its gradient zero. (This is exactly analogous to setting the first derivative to zero in calculus.)

$\nabla L = 0:$

$\frac{\partial L}{\partial x} = 2x + 3 \lambda = 0$

$\frac{\partial L}{\partial y} = 2y + \lambda = 0$

$\frac{\partial L}{\partial \lambda} = 3x + y – 6 = 0$,

In our example we have arrived at a system of simultaneous linear equations which can (and should) be solved with matrix algebra. The solution will be a vector holding values for $x, y$ and $\lambda$. The lowest value of the objective function, subject to the given constraint, sits at $(x,y,f(x,y))$, and the Lagrange multiplier does not have an immediate physical interpretation. (The multipliers have meaning when appearing in certain contexts, more info on that here.)

You probably figured it out already, but for posterity — this document gives intuition which I didn’t see in Boyd book, esp. Figure 3 on page 3.

To solve equality constrained problem you look for points where gradient of objective is in the same direction as gradient of the constraint function. Why those points? Well, suppose we have a feasible point that’s a maximum of objective and the gradients of objective/constraint functions are not collinear. Then you can take an infinitesimal step in the direction orthogonal to the gradient of constraint function to increase the objective. And because you are moving in the direction orthogonal to gradient of the constraint function, your constraint is still satisfied. Hence you get a feasible point with larger objective value, contradiction.

*Two variables and one constraint*

If we have an objective function of two real variables $f(x,y)$ subject to the constraint $g(x,y)=0$ and want to find the minima or maximum, provided that there is any, then the three following conditions must be satisfied:

$f_{x}(x,y)=0$

$f_{y}(x,y)=0$

$g(x,y)=0$

Hence

$\lambda g(x,y)=0$,

$\lambda g_{x}(x,y)=0$,

and

$\lambda g_{y}(x,y)=0$,

where $\lambda $ is the Lagrange multiplier. Thus

$f_{x}(x,y)-\lambda g_{x}(x,y)=0$,

$f_{y}(x,y)-\lambda g_{y}(x,y)=0$.

for some $(x,y,\lambda )=(x^{\ast },y^{\ast },\lambda ^{\ast })$

Now we set $F(x,y,\lambda )=f(x,y)-\lambda g(x,y)$. *Observe that*

(This is the key point of my explanation)

$F_{x}(x,y,\lambda )=f_{x}(x,y)-\lambda g_{x}(x,y)=0$,

$F_{y}(x,y,\lambda )=f_{y}(x,y)-\lambda g_{y}(x,y)=0$

$F_{\lambda }(x,y,\lambda )=-g(x,y)=0$

Hence we must find the solution $(x^{\ast },y^{\ast },\lambda ^{\ast })$ of

the following system of 2+1 equations:

$F_{x}(x,y,\lambda )=0$

$F_{y}(x,y,\lambda )=0$

$F_{\lambda }(x,y,\lambda )=0$

*Maximum or Minimum?*

To establish whether the function $f(x,y)$ has a maximum or a minimum one must define additional conditions that take into account the 2nd derivatives of $f$.

$n$ *variables and* $m$ *constraints*

For a function $f$ of $n$ real variables, subject to $m$ constraints $g_{k}$, with $1\leq k\leq m$, we have $m$ Lagrange multipliers and must solve a system of $n+m$ equations.

- Calculus of variations with two functions and inequality
- When can we can conclude that if a function vanishes on a non-degenerate interval, then that function must vanish everywhere?
- Every bounded non countable subset of $\mathbb{R}$ has a bothsided accumulation point.
- Show that $\ker \hat{T} = \text{ann}(\text{range } T)$
- Topology proof: dense sets and no trivial intersection
- Show that $\frac {\sin x} {\cos 3x} + \frac {\sin 3x} {\cos 9x} + \frac {\sin 9x} {\cos 27x} = \frac 12 (\tan 27x – \tan x)$
- Why does a Lipschitz function $f:\mathbb{R}^d\to\mathbb{R}^d$ map measure zero sets to measure zero sets?
- prime numbers property, Merten's theorem related sequence
- induced representation, dihedral group
- Show that every k-dimensional vector subspace V of $R^N$ is a manifold diffeomorphic to $R^k$.
- A ring problem in Bhattacharya's book “Basic Abstract Algebra”
- The group $E(\mathbb{F}_p)$ has exactly $p+1$ elements
- Equality in rng with no zero divisors.
- Crafty solutions to the following limit
- Deriving an expression for $\cos^4 x + \sin^4 x$