Friday, December 8, 2017

Logistic Activation Notes

1. Single example and featureFor an example $x$, the classification $y = 1$ if true  and $y=0$ if false.
The Linear step, Activation step (possibility of true predicted by logistic function $\sigma$ for $x$), and Loss function are:
\begin{align}
z &= wx+b \\
a &= P(1|x) = \sigma(z) = \frac{1}{1+e^{-z}} \\
L &= -\big[  y\log{a}+ (1-y)\log(1-a)  \big]
\end{align}
Derivatives:
\begin{align}
\frac{dL}{da} &= -\frac{y}{a} + \frac{1-y}{1-a} \\
\frac{da}{dz} &= a(1-a) \\
\frac{dL}{dz} &= \frac{dL}{da} \frac{da}{dz} = a-y
\end{align}

2. Multiple examples and features; W: $1\times N$; Z, A, L, dZ, dA: $1\times m$; b, J: $1\times1$


$i$:     index of $m$ examples
$n$:    index of $N$ features
$X^n_i$:  $n$th feature of $i$th example
$W_n$: $n$th weight
$b$:     bias
$Z_i$:   $i$th linear forward
$J$:     cost function
The Linear step, Activation step (possibility of true predicted by logistic function $\sigma$ for $X$), Loss function, and the cost function are:
\begin{align}
Z_i &= W_n X^n_i + b \\
A_i &= \sigma(Z_i) \\
L_i &= -\big[  Y_i\log{A_i}+ (1-Y_i)\log(1-A_i)  \big] \\
J &= \frac{1}{m} \sum_{i} L_i = -\frac{1}{m} \sum_{i} \big[  Y_i\log{A_i}+ (1-Y_i)\log(1-A_i)  \big]
\end{align}
Derivatives:
\begin{align}
& \frac{\partial Z_i}{\partial W_n} = X^n_i \\
& \frac{\partial Z_i}{\partial b} = 1_i \\
dA_i &=\frac{\partial J}{\partial A^i} = \frac{1}{m} \frac{\partial L_i}{\partial A^i}  = \frac{1}{m}(-\frac{Y_i}{A^i} + \frac{1-Y_i}{1-A^i}) \\
dZ_i &= \frac{\partial J}{\partial Z^i} = \frac{1}{m} \frac{\partial L_i}{\partial Z^i} = \frac{1}{m} (A_i - Y_i) \\
db &= \frac{\partial J}{\partial b} = \frac{\partial J}{\partial Z^i} \frac{\partial Z^i}{\partial b} = dZ_i 1^i = \sum_{i} dZ_i \\
dW_n &= \frac{\partial J}{\partial W^n} =\frac{\partial J}{\partial Z^i} \frac{\partial Z^i}{\partial W^n} = dZ_i X^i_n
\end{align}

In matrix form:
\begin{align}
Z&= W X + b \\
A &= \sigma(Z) \\
L &= -\big[  Y\cdot\log{A}+ (1-Y)\cdot \log(1-A)  \big]  \text{;  element-wise multiplication} \\
J &= \frac{1}{m} \sum_{i} L_i = -\frac{1}{m} \sum_{i} \big[  Y_i\log{A_i}+ (1-Y_i)\log(1-A_i)  \big]
\end{align}
Derivatives:
\begin{align}
dA &= \frac{1}{m}(-\frac{Y}{A} + \frac{1-Y}{1-A}) \text{;  element-wise divide} \\
dZ &= \frac{1}{m} (A - Y) \\
db &=  \sum_{i} dZ_i \\
dW &= dZ X^T
\end{align}

No comments:

Post a Comment