Background
This background is not really necessary to answer my question, but I included it here to provide context.
This question has some programming aspects to it as well, but since my question is mainly about math, I decided to ask it here.
I’m trying to extend the implementation of automatic differentiation found here. This implementation, assuming I read it properly, does not work for functions of the form $F(x)=f(x)^{g(x)}$. I’m trying to modify it so that it does work for such functions.
Question
I’m trying to find a derivative for functions of the form $F(x)=f(x)^{g(x)}$. I specifically only care about the “normal” cases where $g(x)$ is an integral constant, or $f(x)$ is positive. Wikipedia has provided me with a “Generalized Power Rule”:
$$(f^g)^\prime = f^g\left(f^\prime\frac{g}{f}+g^\prime\ln f\right)$$
The generalized rule however does not work for $f\leq 0$. In my implementation it is difficult to tell which of the two cases I’m working with, so I would rather not need to implement this generalized rule for the latter case, and the basic power rule for the former.
Is there a rule that works for both cases?
I think the problem here is stemming from confusion of the logarithm rule $$\ln a^r=r\ln a$$ This is only valid if we have $a>0$. Otherwise it does not hold. For example, $\ln((-1)^2)=\ln 1=0$ whereas $2\ln(-1)$ is not defined over the reals.
Instead, we can write $$\ln a^r=r\ln |a|,\;\;\;a^r>0$$ So in your case, let $y=x^2$. Then $\ln y=2\ln|x|$ so $$\frac{y’}{y}={2\over x}\implies y’=\frac{2y}{x}=2x$$ as desired. This holds for all $x\ne 0$.
In general, let $y=f(x)^{g(x)}$ and suppose that $f(x)^{g(x)}>0$ is defined and positive for all relevant $x$. Then we have $$\ln y=g(x)\ln |f(x)|,\;\;\;\frac{y’}{y}=g’\ln|f(x)|+g\frac{f’}{f}$$ so that
$$(f^g)’=f^g\left(g’\ln|f|+f’\frac{g}{f}\right)$$ We must have $f(x)\ne 0$ as expected.
NOTE: If a function satisfies the requirement that $f^g\ge 0$ for all $x$ in the domain, then at an anomalous point where f(x)=0 (such as the origin on the parabola $y=x^2$) the derivative there must be $0$ because it will be a minimum of the function (assuming it is differentiable at all).
You have two conflicting goals here. If $y$ is arbitrary, then $x^y$ only
makes sense for positive $x$. Imagine, for example, that $y = \frac{1}{2}$.
Then $x^y = \sqrt{x}$ – what does that mean for negative $x$? Note that
switching to complex numbers doesn’t help much – negative numbers do have
square roots then, but those are non-unique, and what’s worse, the number
of solutions is hightly dependent on $y$! E.g., $y^n = x$ has $n$ solutions in
$\mathbb{C}$ – which is $x^\frac{1}{n}$ supposed to be?
So you’ll have to distinguish two cases. One is $f(x)^{g(x)}$ for
positive $f$, and the other is $f(x)^k$ for constants $k \in \mathbb{Z}$
(i.e., no fractional exponents). You could generalize the second case to
$f(x)^{g(x)}$ for functions $g$ which take only integral values, but since
such functions are either constant or non-continuous, that case isn’t really
interesting for purposes of differentiation I think.
BTW, a far more interesting (and maybe solvable!) question is how to deal with
non-negative $f$, which nevertheless may take the value zero. $f(x)^{g(x)}$ is perfectly well-defined for those, but you’ll still run into problems with the logarithm. Now, in some cases these problems are due to the fact that the derivative does, in fact, not exists at these points. But not in al cases! For example, $f(x) = x^2$ has derivative $0$ at $x=0$. The reason is, basically, that since $g$ is constant in this case, then $g’ ln f$ doesn’t matter, because $g’ = 0$, and similarly for $f’\frac{g}{f}$. But you can’t just cancel things that way in all cases – that will produce wrong answers sometimes, because it actually depends on how fast things go to zero respectively infinity.
You might ask, then, why the non-uniqueness mentioned above doesn’t prevent us from sensibly defining $\sqrt[x]{x}$ – after all, $y^n = x$ has two solutions for positive $x$ even in \mathbb{R}$. The reason is twofold
The number of solutions doesn’t explode as badly. We have one solution of $y^n = x$ for odd $n$, and two for even $n$.
There’s an order on $\mathbb{R}$, which makes the definition of $\sqrt[n]{x}$ as the (unique!) positive solution of $y^n = x$ quite natural.
The effect of (1) and (2) is, for example, that while it’s not true that $\sqrt[n]{x^n} = x$, we do get at least that $\sqrt[n]{x^n} = |x|$. Trying to do the same over the complex numbers fails horribly. We could attempt to define $\sqrt[n]{x}$ as the solution of $y^n =x$ with the smallest angle (assuming we agree to measure angles counter-clockwise from the real axis). But then an $n$-th root always has an angle smaller than $\frac{2\pi}{n}$, so $\sqrt[n]{x^n}$ and $x$ would have very little in common except that their $n$-th power is $x^n$.