Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
null
US
Sign In
Sign Up for Free
Sign Up
Remove ads
We have detected that Javascript is not enabled in your browser. The dynamic nature of our site means that Javascript must be enabled to function properly. Please read our
terms and conditions
for more information.
Info
Ratings
Comments
Mind Map
by
Aleksandar Kovacevic
, created
more than 1 year ago
Very High Data Science Mind Map on Neural Networks - Data Analysis, created by Aleksandar Kovacevic on 17/09/2017.
Pinned to
18
0
0
No tags specified
data science
very high
Created by
Aleksandar Kovacevic
over 7 years ago
Copied by
Aleksandar Kovacevic
over 7 years ago
Rate this resource by clicking on the stars below:
(0)
Ratings (0)
0
0
0
0
0
0 comments
There are no comments, be the first and leave one below:
To join the discussion, please
sign up for a new account
or
log in with your existing account
.
Close
10404699
mind_map
2017-10-02T18:16:49Z
Data is too roughly modelled
(underfitting)
High Bias
High Variance
Tuning
Bad - Fails to
break
Symmetry ->
gradient not
decreasing
Zeros?
NO!
Good - Breaks
Symmetry
Bad - large weight ->
exploding gradients
Random Init
Good - Ensures faster learning
speed
works well with ReLU activations
He Init - the
best!
Weights/parameters
initialization
Neural Networks -
Data Analysis
Dataset Split
Data > 1M
98% Train,
1% Dev, 1%
Test
Small Data, 60%
Train, 20% Dev,
20% Test
Train set from
different
distribution than
Dev/test sets
Data is too good modelled (overfitting)
Bigger Network
Train Longer
NN architecture search
More Data
Regularization
L2 regularization : add (lamba/2*m)* ||W||F to cost
L1 regularization: add (lamba/2*m)* ||W||F to cost
Regularization required also to partial differential: W= W-alpha*dW
Intuition for param Lambda: Lambda goes High -> weights goes low -> makes NN more linear
Weight Decay
Dropout
Randomly take out certain neurons from network
Data Augmentation
Adding more training data of distorting existing data
Early Stopping
Stop earlier, where train error and dev error are at min
Optimization
Problem
Data not
normalized ->
slower training
process
Normalize data to have mean=0, and
std=1
Vanishing/ exploding
gradients
Deep network have issue of
becomming to high or two low
throughout network
Gradient Checking
Compare cost function, when
increased and decreased by small
value epsilon
Optimization
Algorithms
Mini-Batch
gradient
descent
Split the input and output (X,Y) data into small slices /
batches, and calculate costs of only these batches
Choosing Batch Size
small set (m<=2000) ->
batch gradient descent
larger set -> batch size:
64,128,256 or 512
Make sure batch fits CPU/GPU
mem
Exponentially
weighted
averages
Weights are recalculated
based on formula
Bias Correction
Corrects the starting values of exp.
weighted averages using the formula:
v(t)=v(t)/(1-beta^t)
Gradient
Descent with
Momentum
Aim: accelerator horizontal component of
gradient descent to converge faster towards
solution. Based similarly on formula for
exp.weighted averages, just with gradient
instead of theta
RMSprop
Aim to slower the vertical component of
gradient descent and speed up the
horizontal component.
Adam
Combination of
RMSprop and
Gradient Descent
with momentum
Learning
Decay
A method to lowe the learning rate
closer it gets to minimum.
Many formulas exist,the most famous
is: alpha =
1/(1+decay_rate*epoch_num)
Hyperparameter choice: alpha :
needs to be tuned, beta1 = 0.9,
beta2 = 0.999, epsilon = 1e-8
Tuning algorithm's
hyperparameters
priorities:
darkest - most
important,
lightest - least.
white is fixed.
try random values: dont't use grid
Coarse to fine choice
randomness scale choice, e.g.
for alpha - logarithmic scale
Batch Normalization
Idea of
normalizing
each layer
input (Z, not A)
of Neural
Network
Double click this node
to edit the text
Click and drag this button
to create a new node
New
0
of
0
Go to link
Track All
Untrack All
You need to log in to complete this action!
Register for Free