Mercurial > hg > tvii
comparison docs/summary_of_gradient_descent.txt @ 53:673a295fd09c
[documentation] cache coursera notes
| author | Jeff Hammel <k0scist@gmail.com> |
|---|---|
| date | Sun, 24 Sep 2017 14:42:56 -0700 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 52:0b3daccfc36c | 53:673a295fd09c |
|---|---|
| 1 # Summary of Gradient Descent | |
| 2 | |
| 3 For a two layer network. The `[]`s denote the layer number. | |
| 4 `'` denotes prime. `T` denotes transpose. | |
| 5 | |
| 6 ## Scalar implementation | |
| 7 | |
| 8 ``` | |
| 9 dz[2] = a[2] - y | |
| 10 dW[2] = dz[2]a[1]T | |
| 11 db[2] = dz[2] | |
| 12 dz[1] = W[2]Tdz[2] * g[1]'(z[1]) | |
| 13 dW[1] = dz[1]xT | |
| 14 db[1] = dz[1] | |
| 15 ``` | |
| 16 | |
| 17 | |
| 18 ## Vectorized Implementation | |
| 19 | |
| 20 ``` | |
| 21 dZ[2] = A[2] - Y | |
| 22 dW[2] = (1/m)dZ[2]A[1]T | |
| 23 db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True) | |
| 24 dZ[1] = W[2]TdZ[2] * g[1]'(z[1]) | |
| 25 db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True) | |
| 26 ``` |
