Neural Networks - Data Analysis

Description

Very High Data Science Mind Map on Neural Networks - Data Analysis, created by Aleksandar Kovacevic on 17/09/2017.
Aleksandar Kovacevic
Mind Map by Aleksandar Kovacevic, updated more than 1 year ago More Less
Aleksandar Kovacevic
Created by Aleksandar Kovacevic over 8 years ago
Aleksandar Kovacevic
Copied by Aleksandar Kovacevic over 8 years ago
19
0

Resource summary

Neural Networks - Data Analysis
  1. Tuning
    1. High Bias
      1. Data is too roughly modelled (underfitting)
        1. Bigger Network
          1. Train Longer
            1. NN architecture search
            2. High Variance
              1. Data is too good modelled (overfitting)
                1. More Data
                  1. Regularization
                    1. Weight Decay
                      1. L2 regularization : add (lamba/2*m)* ||W||F to cost
                        1. L1 regularization: add (lamba/2*m)* ||W||F to cost
                          1. Regularization required also to partial differential: W= W-alpha*dW
                            1. Intuition for param Lambda: Lambda goes High -> weights goes low -> makes NN more linear
                            2. Dropout
                              1. Randomly take out certain neurons from network
                              2. Data Augmentation
                                1. Adding more training data of distorting existing data
                                2. Early Stopping
                                  1. Stop earlier, where train error and dev error are at min
                              3. Optimization Problem
                                1. Data not normalized -> slower training process
                                  1. Normalize data to have mean=0, and std=1
                                    1. Vanishing/ exploding gradients
                                      1. Deep network have issue of becomming to high or two low throughout network
                                      2. Gradient Checking
                                        1. Compare cost function, when increased and decreased by small value epsilon
                                        2. Optimization Algorithms
                                          1. Mini-Batch gradient descent
                                            1. Split the input and output (X,Y) data into small slices / batches, and calculate costs of only these batches
                                              1. Choosing Batch Size
                                                1. small set (m<=2000) -> batch gradient descent
                                                  1. larger set -> batch size: 64,128,256 or 512
                                                    1. Make sure batch fits CPU/GPU mem
                                                  2. Exponentially weighted averages
                                                    1. Weights are recalculated based on formula
                                                      1. Bias Correction
                                                        1. Corrects the starting values of exp. weighted averages using the formula: v(t)=v(t)/(1-beta^t)
                                                      2. Gradient Descent with Momentum
                                                        1. Aim: accelerator horizontal component of gradient descent to converge faster towards solution. Based similarly on formula for exp.weighted averages, just with gradient instead of theta
                                                        2. RMSprop
                                                          1. Aim to slower the vertical component of gradient descent and speed up the horizontal component.
                                                          2. Adam
                                                            1. Combination of RMSprop and Gradient Descent with momentum
                                                              1. Hyperparameter choice: alpha : needs to be tuned, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8
                                                              2. Learning Decay
                                                                1. A method to lowe the learning rate closer it gets to minimum.
                                                                  1. Many formulas exist,the most famous is: alpha = 1/(1+decay_rate*epoch_num)
                                                                  2. Tuning algorithm's hyperparameters
                                                                    1. priorities: darkest - most important, lightest - least. white is fixed.
                                                                      1. try random values: dont't use grid
                                                                        1. Coarse to fine choice
                                                                          1. randomness scale choice, e.g. for alpha - logarithmic scale
                                                                          2. Batch Normalization
                                                                            1. Idea of normalizing each layer input (Z, not A) of Neural Network
                                                                      2. Weights/parameters initialization
                                                                        1. Zeros? NO!

                                                                          Annotations:

                                                                          • Zeros will make all neurons of neural network act the same, and behave linear, which loose the sense of having neural network.
                                                                          1. Bad - Fails to break Symmetry -> gradient not decreasing
                                                                          2. Random Init

                                                                            Annotations:

                                                                            • - Initializing weights to very large random values does not work well.  - intializing with small random values does better. 
                                                                            1. Good - Breaks Symmetry
                                                                              1. Bad - large weight -> exploding gradients
                                                                              2. He Init - the best!

                                                                                Annotations:

                                                                                • sqrt(2./layers_dims[l-1])
                                                                                1. Good - Ensures faster learning speed
                                                                                  1. works well with ReLU activations
                                                                                2. Dataset Split
                                                                                  1. Data > 1M 98% Train, 1% Dev, 1% Test
                                                                                    1. Small Data, 60% Train, 20% Dev, 20% Test
                                                                                      1. Train set from different distribution than Dev/test sets
                                                                                      Show full summary Hide full summary

                                                                                      Similar

                                                                                      Basic Python - Lists
                                                                                      Rebecca Noel
                                                                                      Python
                                                                                      Jay Prakash
                                                                                      Computer Science
                                                                                      Bayram Annanurov
                                                                                      Sampling Techniques In Data Science
                                                                                      Vishakha Achmare
                                                                                      Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                                                                      Data science council of America
                                                                                      Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                                                                      Data science council of America
                                                                                      Skewed Distributions in Data Science.
                                                                                      Vishakha Achmare
                                                                                      Inferential Statistics for Data Science
                                                                                      Vishakha Achmare
                                                                                      Logistic regression
                                                                                      Vishakha Achmare
                                                                                      Linear Regression
                                                                                      Vishakha Achmare
                                                                                      Descriptive Statistics for Data Science
                                                                                      Vishakha Achmare