Intereting Posts

Quotient map from $\mathbb R^2$ to $\mathbb R$ with cofinite topology
Prove that $\frac{x^x}{x+y}+\frac{y^y}{y+z}+\frac{z^z}{z+x} \geqslant \frac32$
Proof of $(\mathbb{Z}/m\mathbb{Z}) \otimes_\mathbb{Z} (\mathbb{Z} / n \mathbb{Z}) \cong \mathbb{Z}/ \gcd(m,n)\mathbb{Z}$
Quotient group $\mathbb Z^n/\ \text{im}(A)$
A simple series $\sum_{i=1}^\infty \frac{i}{2^i} = 2$
Characterizing simple grassman algebra modules.
What is the ring $\Bbb Z/(x^2-3,2x+4)$?
How to prove that the inverse of a matrix is unique?
Prove $\cos^2x\sin^4x = \frac{1}{32}(2-\cos(2x)-2\cos(4x)+\cos(6x))$
Compute the limit of $\int_{n}^{e^n} xe^{-x^{2016}} dx$ when $n\to\infty$
Easy criteria to determine isomorphism of fields?
$\frac{1}{\infty}$ – is this equal $0$?
Sum of two truncated gaussian
Evaluate the following integral $\int_{0}^{10}\sqrt{-175e^{-t/4}+400}dt$
Can someone explain this proof of the product property of square roots?

I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function:

\begin{equation}

p_j = \frac{e^{o_j}}{\sum_k e^{o_k}}

\end{equation}

This is used in a loss function of the form

- If $\lim_{h\to 0} \frac{f(x_0 + h) - f(x_0 - h)}{2h} = f'(x_0)$ exists, is f differentiable at $x_0$?
- How to prove l'Hospital's rule for $\infty/\infty$
- Find the volume of a cone whose length of its side is $R$
- Functions $f(x)/g(x), g(x)/h(x),h(x)/f(x)$ are constant
- Derivative of an even function is odd and vice versa
- Finding the $n$-th derivatives of $x^n \ln x$ and $\frac{\ln x}{x}$.

\begin{equation}L = -\sum_j y_j \log p_j,\end{equation}

where $o$ is a vector. I need the derivative of $L$ with respect to $o$. Now if my derivatives are right,

\begin{equation}

\frac{\partial p_j}{\partial o_i} = p_i(1 – p_i),\quad i = j

\end{equation}

and

\begin{equation}

\frac{\partial p_j}{\partial o_i} = -p_i p_j,\quad i \neq j.

\end{equation}

Using this result we obtain

\begin{eqnarray}

\frac{\partial L}{\partial o_i} &=& – \left (y_i (1 – p_i) + \sum_{k\neq i}-p_k y_k \right )\\

&=&p_i y_i – y_i + \sum_{k\neq i} p_k y_k\\

&=& \left (\sum_i p_i y_i \right ) – y_i

\end{eqnarray}

According to slides I’m using, however, the result should be

\begin{equation}

\frac{\partial L}{\partial o_i} = p_i – y_i.

\end{equation}

Can someone please tell me where I’m going wrong?

- Stuff which squares to $-1$ in the quaternions, thinking geometrically.
- Solve $\lim_{x\to0}{\frac{x^2\cdot\sin\frac{1}{x}}{\sin x}}$
- Linear Algebra Complex Numbers
- Symmetric matrix eigenvalues
- How to get the characteristic equation from a recurrence relation of this form?
- Normal Operators: Polar Decomposition (Rudin)
- Link between Gram Matrix and volume of parallelpiped question - Determinant
- Is the derivative of a function bigger or equal to $e^x$ will always be bigger or equal to the function ?!
- Does $M_n^{-1}$ converge for a series of growing matrices $M_n$?
- Find a basis for a solution set of a linear system

Your derivatives $\large \frac{\partial p_j}{\partial o_i}$ are indeed correct, however there is an error when you differentiate the loss function $L$ with respect to $o_i$.

We have the following (where I have highlighted in $\color{red}{red}$ where you have gone wrong)

$$\frac{\partial L}{\partial o_i}=-\sum_ky_k\frac{\partial \log p_k}{\partial o_i}=-\sum_ky_k\frac{1}{p_k}\frac{\partial p_k}{\partial o_i}\\=-y_i(1-p_i)-\sum_{k\neq i}y_k\frac{1}{p_k}({\color{red}{-p_kp_i}})\\=-y_i(1-p_i)+\sum_{k\neq i}y_k({\color{red}{p_i}})\\=-y_i+\color{blue}{y_ip_i+\sum_{k\neq i}y_k({p_i})}\\=\color{blue}{p_i\left(\sum_ky_k\right)}-y_i=p_i-y_i$$ given that $\sum_ky_k=1$ from the slides (as $y$ is a vector with only one non-zero element, which is $1$).

- a Fourier transform (sinc)
- Prove that the Completeness Axiom follows from the Least Upper Bound Principle.
- Making sense out of “field”, “algebra”, “ring” and “semi-ring” in names of set systems
- Expected number of unpecked chicks – NYT article
- Number of permutations which are products of exactly two disjoint cycles.
- Three tangents meet opposide edges in collinear points
- Slow growing integer sequence that is periodic modulo primes
- Nilpotent elements of a non-commutative ring with trivial automorphism group form an ideal
- Show that $B$ is invertible if $B=A^2-2A+2I$ and $A^3=2I$
- How to calculate the ideal class group of a quadratic number field?
- elementary question about uniform convergence
- Existence of rotations between two points
- How to evaluate $\lim_{n\to\infty}\sqrt{\frac{1\cdot3\cdot5\cdot\ldots\cdot(2n-1)}{2\cdot4\cdot6\cdot\ldots\cdot2n}}$
- Teaching Introductory Real Analysis
- Integral $\int_{-\infty}^\infty\frac{\Gamma(x)\,\sin(\pi x)}{\Gamma\left(x+a\right)}\,dx$