Friday, December 8, 2017

NNDL HW4-2 Deep Neural Network for Image Classification: Application


Deep Neural Network for Image Classification: Application

Forward and backward propagations for gradient descent

$I$ :  number of examples
$i$ :  example index
$R$:  number of features in $0^\text{th}$ layer
$r$:   index for $0^\text{th}$ layer
$S$:   number of features in $1^\text{st}$ layer
$s$ :   feature index for layer $[l]$
$T$:   number of features in $2^\text{nd}$ layer
$t$ :   index in $2^\text{nd}$ layer
Summary of forward propagation:
\begin{array}{cl|lll}
\text{Layer} & \text{Name} &  \text{Variable} & \text{Size}& \text{Element} & \text{Forward Prop.} & \text{Forward Element} &  \\
\hline
0 & Feature &  A^{[0]} & R\times I & A^{r[0]}_i & & \\
\hline
1 & Feature &  A^{[1]} & S\times I & A^{s[1]}_i &  A^{[1]} = g(Z^{[1]})  &
        A^{s[1]}_i = g(Z^{s[1]}_i) \\
   & Z           &  Z^{[1]} & S\times I & Z^{s[1]}_i & Z^{[1]} = W^{[1]}A^{[0]}+b^{[1]} &
        Z^{s[1]}_i = W^{s[1]}_r A^{r[0]}_i + b^{s[1]} \\
   & Weight  & W^{[1]} &S\times R &W^{s[1]}_r& & \\
   & Bias      &   b^{[1]} & S\times 1& b^{s[1]} & &\\
\hline
2 & Feature &  A^{[2]} & T\times I & A^{t[2]}_i & A^{[2]} = \sigma(Z^{[2]}) &
       A^{t[2]}_i = \sigma(Z^{t[2]}_i) \\
   & Z           &  Z^{[2]} & T\times I & Z^{t[2]}_i & Z^{[2]} = W^{[2]}A^{[1]}+b^{[2]} &
       Z^{t[2]}_i = W^{t[2]}_s A^{s[1]}_i + b^{t[2]}  \\
   & Weight  & W^{[2]} &T\times S &W^{t[2]}_s& & \\
   & Bias      &   b^{[2]} & T\times 1&  b^{t[2]} \\
\end{array}
Summary of backward propagation:
\begin{array}{cl|lll}
\text{Layer} & \text{Name} &  \text{Variable} & \text{Size}& \text{Element} & \text{Backward Prop.} & \text{Backward Element} &  \\
\hline
1  & Feature &  dA^{[1]} & S\times I & dA^{s[1]}_i & dA^{[1]} = W^{[2]T} dZ^{[2]} & dA_i^{s[1]} = dZ_i^{t[2]} W_t^{s[2]} &  \\
    & Z           &  dZ^{[1]} & S\times I & dZ^{s[1]}_i & dZ^{[1]} = W^{[2]T} dZ^{[2]}
        \cdot g'(Z^{[1]}) &  dZ_i^{s[1]} =  W_t^{s[2]} dZ_i^{t[2]}  g'(Z_i^{s[1]}) \\
    & Weight  & dW^{[1]} &S\times R &dW^{s[1]}_r& dW^{[1]} = dZ^{[1]} A^{[0]T} &
        dW_r^{s[1]} =  dZ_i^{s[1]} A_r^{i[0]}\\
    & Bias      &   db^{[1]} & S\times 1& db^{s[1]} & db^{[1]} = \sum_i Z^{[1]}_i &
        db^{s[1]} = \sum_i Z_i^{s[1]} \\
\hline
 2  & Feature &  dA^{[2]} & T\times I & dA^{t[1]}_i &   \\
    & Z           &  dZ^{[2]} & T\times I & dZ^{t[2]}_i & dZ^{[2]} &
         dZ_i^{t[2]} = \partial J / \partial Z_t^{i[2]}  \\
     & Weight  & dW^{[2]} &T\times S &dW^{t[2]}_s& dW^{[2]} = dZ^{[2]} A^{[1]T} &
         dW_s^{t[2]} = dZ_i^{t[2]} A_s^{i[1]} \\
     & Bias      &   db^{[2]} & T\times 1&  db^{t[2]}    & db^{[2]} = \sum_i dZ^{[2]}_i  &
         db^{t[2]} = \sum_{i}dZ_i^{t[2]} \\
\end{array}
Derivations:

Logistic Activation Notes

1. Single example and featureFor an example $x$, the classification $y = 1$ if true  and $y=0$ if false.
The Linear step, Activation step (possibility of true predicted by logistic function $\sigma$ for $x$), and Loss function are:
\begin{align}
z &= wx+b \\
a &= P(1|x) = \sigma(z) = \frac{1}{1+e^{-z}} \\
L &= -\big[  y\log{a}+ (1-y)\log(1-a)  \big]
\end{align}
Derivatives:
\begin{align}
\frac{dL}{da} &= -\frac{y}{a} + \frac{1-y}{1-a} \\
\frac{da}{dz} &= a(1-a) \\
\frac{dL}{dz} &= \frac{dL}{da} \frac{da}{dz} = a-y
\end{align}

2. Multiple examples and features; W: $1\times N$; Z, A, L, dZ, dA: $1\times m$; b, J: $1\times1$

Thursday, December 7, 2017

NNDL HW4-1 Building your Deep Neural Network: Step by Step

https://www.coursera.org/learn/neural-networks-deep-learning/notebook/lSYZM/building-your-deep-neural-network-step-by-step

Building your Deep Neural Network: Step by Step