Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Aleksandar Kovacevic
Mind Map by , created more than 1 year ago

Very High Data Science Mind Map on Neural Networks - Data Analysis, created by Aleksandar Kovacevic on 17/09/2017.

18
0
0
Aleksandar Kovacevic
Created by Aleksandar Kovacevic over 7 years ago
Aleksandar Kovacevic
Copied by Aleksandar Kovacevic over 7 years ago
Rate this resource by clicking on the stars below:
1 2 3 4 5 (0)
Ratings (0)
0
0
0
0
0

0 comments

There are no comments, be the first and leave one below:

Close
Data is too roughly modelled(underfitting)High BiasHigh VarianceTuningBad - Fails tobreakSymmetry ->gradient notdecreasingZeros?NO!Good - BreaksSymmetryBad - large weight ->exploding gradientsRandom InitGood - Ensures faster learningspeedworks well with ReLU activationsHe Init - thebest!Weights/parametersinitializationNeural Networks -Data AnalysisDataset SplitData > 1M98% Train,1% Dev, 1%TestSmall Data, 60%Train, 20% Dev,20% TestTrain set fromdifferentdistribution thanDev/test setsData is too good modelled (overfitting)Bigger NetworkTrain LongerNN architecture searchMore DataRegularizationL2 regularization : add (lamba/2*m)* ||W||F to costL1 regularization: add (lamba/2*m)* ||W||F to costRegularization required also to partial differential: W= W-alpha*dWIntuition for param Lambda: Lambda goes High -> weights goes low -> makes NN more linearWeight DecayDropoutRandomly take out certain neurons from networkData AugmentationAdding more training data of distorting existing dataEarly StoppingStop earlier, where train error and dev error are at minOptimizationProblemData notnormalized ->slower trainingprocessNormalize data to have mean=0, andstd=1Vanishing/ explodinggradientsDeep network have issue ofbecomming to high or two lowthroughout networkGradient CheckingCompare cost function, whenincreased and decreased by smallvalue epsilonOptimizationAlgorithmsMini-BatchgradientdescentSplit the input and output (X,Y) data into small slices /batches, and calculate costs of only these batchesChoosing Batch Sizesmall set (m<=2000) ->batch gradient descentlarger set -> batch size:64,128,256 or 512Make sure batch fits CPU/GPUmemExponentiallyweightedaveragesWeights are recalculatedbased on formulaBias CorrectionCorrects the starting values of exp.weighted averages using the formula:v(t)=v(t)/(1-beta^t)GradientDescent withMomentumAim: accelerator horizontal component ofgradient descent to converge faster towardssolution. Based similarly on formula forexp.weighted averages, just with gradientinstead of thetaRMSpropAim to slower the vertical component ofgradient descent and speed up thehorizontal component.AdamCombination ofRMSprop andGradient Descentwith momentumLearningDecayA method to lowe the learning ratecloser it gets to minimum.Many formulas exist,the most famousis: alpha =1/(1+decay_rate*epoch_num)Hyperparameter choice: alpha :needs to be tuned, beta1 = 0.9,beta2 = 0.999, epsilon = 1e-8Tuning algorithm'shyperparameterspriorities:darkest - mostimportant,lightest - least.white is fixed.try random values: dont't use gridCoarse to fine choicerandomness scale choice, e.g.for alpha - logarithmic scaleBatch NormalizationIdea ofnormalizingeach layerinput (Z, not A)of NeuralNetworkDouble click this nodeto edit the textClick and drag this buttonto create a new node