# What are some interesting calculus facts your calculus teachers didn't teach you?

I recently learned an interesting fact about what the value of a Lagrange multiplier represents: suppose the maximum of some real-valued function $f(\vec{x})$ subject to a constraint $g(\vec{x})=c$ is $M$ (of course, $M$ depends on $c$), which you obtained via Lagrange multipliers (solving $\nabla f = \lambda \nabla g$). Then, it’s easy to show (using the chain rule) that the multiplier $\lambda$ can be interpreted as the change of the maximum with respect to perturbations of the level set $g=c$. That is,
$$\lambda = \frac{d M}{dc}$$
I think this is a pretty cool result and I never it heard about during my time as an undergraduate.

#### Solutions Collecting From Web of "What are some interesting calculus facts your calculus teachers didn't teach you?"

When I first learned Taylor’s theorem it didn’t really click with me; I thought it clunky and difficult. Then I saw the simplest, most intuitive proof of Taylor’s theorem there’s got to be. If $f \in C^n$ then

$$f(x) = f(x_0) + \int_{x_0}^x f'(t)$$

Iterate this relation and we get

$$f(x) = f(x_0) + \int_{x_0}^x (f'(x_0) + \int_{x_0}^x f”(t))$$

$$f(x) = f(x_0) + \int_{x_0}^x f'(x_0)\,dt + \int_{x_0}^x\int_{x_0}^u f”(t)\,dtdu$$

and then

$$f(x) = f(x_0) + \int_{x_0}^x (f'(x_0) + \int_{x_0}^x (f”(x_0) + \int_{x_0}^x f”'(t)))$$
$$f(x)= f(x_0) + \int_{x_0}^x f'(x_0)\,dt + \int_{x_0}^x\int_{x_0}^u f”(x_0)\,dtdu + \int_{x_0}^x\int_{x_0}^u\int_{x_0}^wf”'(t)\,dtdwdu$$

so on and so forth, which reduces to

$$f(x) = f(x_0) + f'(x_0)(x-x_0) + f”(x_0)\frac{(x-x_0)^2}{2} + … + R_n(x)$$

where

$$R_n(x) = \int_{x_0}^x\int_{x_0}^x…\int_{x_0}^x f^{(n)}(x) = \frac{1}{n-1!}\int_{x_0}^x f^{(n)}(t)(x-t)^{n-1}\,dt$$

So that Taylor’s theorem is really just iterated integration and derivation all derived from the seemingly obvious formula

$$f(x) = f(x_0) + \int_{x_0}^x f'(t)$$

Every basic fact about computing derivatives is easiest to understand if you allow yourself to use infinitesimals, the use of infinitesimals can be rigorously justified, and it doesn’t require using nonstandard analysis. Moreover, in practice the fastest way to compute derivatives of complicated functions by hand is to use infinitesimals.

Here’s a random example. Suppose I wanted to compute the derivative of $\frac{e^t \cos t}{1 – \tan \log (1 + t)}$ at $t = 0$. You might think I would have to use the quotient rule, then the product rule in the numerator, then the chain rule twice in the denominator. But actually I can do something much faster, which is to pretend that $t^2 = 0$, or in other words to repeatedly take Taylor series and cut them off after the linear term, and then just use ordinary algebra. This produces

$$\frac{(1 + t)(1)}{1 – \tan t} = \frac{1 + t}{1 – t} = 1 + 2t$$

(because $(1 + t)(1 – t) = 1$, so $\frac{1}{1 – t} = 1 + t$). So the derivative at $t = 0$ is $2$. A similar trick where you pretend that $t^3 = 0$ can be used to compute second derivatives, which I’ll demonstrate on the above example because maybe only getting the first derivative looked too easy: this produces

$$\frac{\left( 1 + t + \frac{t^2}{2} \right) \left( 1 – \frac{t^2}{2} \right)}{1 – \tan \left( t – \frac{t^2}{2} \right)} = \frac{1 + t}{1 – t + \frac{t^2}{2}} = (1 + t) \left( 1 + t + \frac{t^2}{2} \right) = 1 + 2t + \frac{3t^2}{2}.$$

So the second derivative is $3$.

Here’s a harder and less elementary example. Consider the function $X \mapsto \det X$ where $X$ is a square matrix. What is its derivative at the identity $X = I$? Well, a standard fact about determinants tells you that

$$\det (1 + t X) = 1 + \text{tr}(X) t + O(t^2)$$

(where $O(t^2)$ – Big O notation – is one way of justifying more rigorously what I’m doing when I ignore second-order and higher terms in the Taylor series). The coefficient of the linear term is $\text{tr}(X)$, so that’s the derivative. It is really annoyingly difficult to try to do this computation by writing out the formula for the determinant in full and then differentiating it; I’ve never tried and I don’t plan to.

This method of computing derivatives is so easy you can teach a computer how to do it.

Nonelementary antiderivatives. While you may have heard that the antiderivative $$f(x)=\int_{0}^{x} e^{-\frac{t^2}{2}}dt$$
is not expressible as an elementary function you definitely haven’t seen the proof.

During the 1st semester of calculus I teach my students about differentials. When I was a student nobody told me what is its geometric interpretation.

I learned about L’Hospital’s Rule (LHR) in my first calculus course as a high school student. For one of the indeterminate forms, the teacher presented the conditions as

Suppose $f$ and $g$ are differentiable in a deleted neighborhood of $x_0$, and $\lim_{x\to x_0}f(x)=\infty$, $\lim_{x\to x_0}g(x)=\infty$, and $g'(x) \ne 0$ in a deleted neighborhood of $x_0$. If $\lim_{x\to x_0}\frac{f'(x)}{g'(x)}$ exists, then $\lim_{x\to x_0}\frac{f(x)}{g(x)}$ also exists and

$$\lim_{x\to x_0}\frac{f(x)}{g(x)}=\lim_{x\to x_0}\frac{f'(x)}{g’x)}$$

The misconception is that the limit of $f$ is required to be $\infty$. This is not the case. In fact, the limit $\displaystyle \lim_{x\to x_0}f(x)$ need not even exist for LHR to be valid provided that $\displaystyle g\to \infty$

EXAMPLE:

There are many applicable examples that might be considered trivial. Here is one that might be less obvious.

Let $f(x)=\log(1/x)+\int_0^1\frac{1-t^x}{1-t}\,dt$ and $g(x)=x$.

Now, without even knowing whether the limit, $\lim_{x\to \infty}f(x)$ exists or not (it is actually equal to $\gamma$, the Euler-Mascheroni constant), we know that $f$ is differentiable for $x>0$ with $f'(x)=-\frac1x-\int_0^1\frac{\log(t)t^x}{1-t}\,dt$.

Inasmuch as $g$ is differentiable with $\lim_{x\to \infty}g(x)=\infty$, L’Hospital’s Rule asserts that

\begin{align} \lim_{x\to \infty}\frac{f(x)}{g(x)}&=\lim_{x\to \infty}\frac{\log(1/x)+\int_0^1\frac{1-t^x}{1-t}\,dt}{x}\\\\ &=\lim_{x\to \infty}\frac{-\frac1x-\int_0^1\frac{\log(t)t^x}{1-t}\,dt}{1}\\\\ &=0 \end{align}

Integration by parts – visualization

Not that shocking, but my teacher never showed me this. For many people this will be familiar, but for those who never saw it: it can be an eye opener.

Image source: Wikipedia

The area of the blue / red region is:

$$A_1=\int_{y_1}^{y_2}x(y)dy$$

$$A_2=\int_{x_1}^{x_2}y(x)dx$$

So we have:

$$\overbrace{\int_{y_1}^{y_2}x(y)dy}^{A_1}+\overbrace{\int_{x_1}^{x_2}y(x)dx}^{A_2}=\biggl.x > . y(x)\biggl|_{x1}^{x2} = \biggl.y . x(y)\biggl|_{y1}^{y2}$$

Assuming the curve is smooth within a neighborhood, this generalizes
to indefinite integrals:

$$\int xdy + \int y dx = xy$$

Rearranging yields the well-known formula:

$\int xdy = xy – \int y dx$