Machine Learning

Types
1. Classification
2. Prediction
Learning Task
1. Supervised
  Anmerkungen:
  - Training data includes the desired solution(labels)
  1. Common Tasks
    1. Classification
    2. Prediction
  2. Common Algorithms
    1. k-Nearest Neighbors
    2. Linear Regression
    3. Logistic Regression
    4. Support Vector Machines(SVMs)
    5. Decision Tree
    6. Random Forests
    7. Neural Networks
2. Unsupervised
  Anmerkungen:
  - Tries to learn without a labeled data
  1. Common Tasks
    1. Clustering
      1. K-Means
      2. Hierarchical Cluster Anaysis(HCA)
      3. Expectation Maximization
    2. Visualization and Dimensionality Reduction
      1. Principal Component Analysis(PCA)
      2. Kernel PCA
      3. Locally-Linear Embedding(LLE)
      4. t-Distributed Stochastic Neighbor Embedding(t-SNE)
    3. Association Rule Learning
      1. Apriori
      2. Eclat
    4. Anomaly Dectection
3. Semisupervised
  Anmerkungen:
  - Some labeled data and lots of unlabelled data
4. Reinforcement
  Anmerkungen:
  - Use reinforcements and penalties to train the data. Often used to make robots learn how to walk.
Learning Process
1. Batch
  Anmerkungen:
  - AKA offline, the system is trained and then is used.
2. Online or Incremental Learning
  Anmerkungen:
  - Learns while it runs. It can be pre trained and then getting updated while it runs. It is also called incremental learning because it can be trained offline but in an incremental fashion.
  1. Out-of-core
    Anmerkungen:
    - Used to train huge datasets. It chops the data in smaller batches and test it untill it's acceptable.
Generalization Model
1. Instance Based
  Anmerkungen:
  - Train by heart. Use data create similarity data to find similarities.
2. Model Based
  Anmerkungen:
  - Create a model from the examples and use that model to make predictions.
  1. Test Functions
    1. Utility Function
    2. Cost Function
Challenges
1. Bad Data
  1. Insufficient Training Data
  2. Non Representative Training Data
    Anmerkungen:
    - The data must represent what you want to generalize.
    1. Sampling Noise
      Anmerkungen:
      - Sample too small. (Non representative data as a result of chance)
    2. Sampling Bias
      Anmerkungen:
      - Large number of data but sampling is flawed.
      1. Nonresponsive Bias
  3. Poor Quality Data
    Anmerkungen:
    - Data that is full of errors, outliers, and noise(e.g., due to poor quality measurements)
    1. Celan up the Outliers
    2. Treat the Instances that are Missing Features
  4. Irrelevant Features
    Anmerkungen:
    - The proccess of feature engineering is used to prevent this problem.
    1. Feature Engineering
      1. Feature Selection
      2. Features Extraction
        Anmerkungen:
        Combine existing features to produce a more useful one. See Dimensionality and reduction algorithms.
      3. New Features
  5. Overfitting the Training Data
    1. Overgeneralizing
    2. Noise Attributes
      Anmerkungen:
      - Introducing uninformative attributes can cause noise and to include unwanted patterns.
2. Bad Algorithm
3. Overfitting
  Anmerkungen:
  - Performs well on training data end poorly on test data.
  1. Possible Solutions
    1. Simplify the Model
      Anmerkungen:
      - Use a model with fewer parameters, reducing the number of attributes
      1. Regularization
        Anmerkungen:
        Constrain the model to make it simpler.
    2. Gather More Training Data
    3. Reduce noise in Training Data
4. Underfitting
  Anmerkungen:
  - Performs bad in training and test data.
  1. Reasons
    1. Model is Too Simple
  2. Possible Solutions
    1. More Powerful Model with More Parameters
    2. Feeding better Features
    3. Reducing Regularization
5. Testing and Validating
  1. Test on Production
    Anmerkungen:
    - Not a good option.
  2. Use Test Set
    Anmerkungen:
    - A common rule is 80% for training data and 20% for test data.
    1. Generalization Error
      Anmerkungen:
      - Error rate on new cases. Training error is low and Generalization error is high you have a overfitting problem.
  3. Validation Test Set
    Anmerkungen:
    - Used to tune the Hyperparameters and lastly test in the test set.
  4. Cross Validation
    Anmerkungen:
    - Split the training set into complementary subsets, and each model is trained against a different combination of these subsets and validated against the remaining parts. Once the model type and hyperparameters have been selected, a final model training is made using the full training set, and the generalization error is measured on the test set. Used to avoid "wasting" too many training data into validation tests sets.
Algorithm
1. Hyperparameter
  Anmerkungen:
  - Is a parameter whose value is set before the learning process begins.
  1. Regularization
Model
1. Model Parameter
  Anmerkungen:
  - Parameters derived via training. Parameters related to the model.e.g., the parameters Theta0 and Theta1 of a linear model.

Nächster

Machine Learning

Beschreibung

Zusammenfassung der Ressource

ähnlicher Inhalt

	Erstellt von Luan Pessoa Rocha vor etwa 6 Jahre