https://www.kdnuggets.com/2017/10/top-10-machine-learning-algorithms-beginners.html
Home made ML (Github)
https://github.com/vinhpq/homemade-machine-learning
Supervised Learning
Nota:
Use labeled training data to learn the mapping function from the input variable (X) to the output variable (Y): Y = f(X)
Regression
Nota:
To predict the outcome of a given sample where the output variable is in the form of real values. Examples include real-valued labels denoting the amount of rainfall, the height of a person.
Linear Regression
Nota:
The relationship between the input variables (x) and output variable (y) is expressed as an equation of the form y = a + bx. Thus, the goal of linear regression is to find out the values of coefficients a and b. Here, a is the intercept and b is the slope of the line.
https://www.kdnuggets.com/wp-content/uploads/Linearreg1-300x150.gif
Classification
Nota:
To predict the outcome of a given sample where the output variable is in the form of categories. Examples include labels such as male and female, sick and healthy.
Logistic Regression
Nota:
Linear regression predictions are continuous values (rainfall in cm),logistic regression predictions are discrete values (whether a student passed/failed) after applying a transformation function.
Decision Trees
Nota:
The model is used as follows to make predictions: walk the splits of the tree to arrive at a leaf node and output the value present at the leaf node.
Naive Bayes
Nota:
To calculate the probability that an event will occur, given that another event has already occurred, we use Bayes’ Theorem.
KNN
Nota:
The k-nearest neighbours algorithm uses the entire dataset as the training set, rather than splitting the dataset into a trainingset and testset.
When an outcome is required for a new data instance, the KNN algorithm goes through the entire dataset to find the k-nearest instances to the new instance, or the k number of instances most similar to the new record, and then outputs the mean of the outcomes (for a regression problem) or the mode (most frequent class) for a classification problem. The value of k is user-specified.
The similarity between instances is calculated using measures such as Euclidean distance and Hamming distance.
Unsupervised Learning
Nota:
Unsupervised learning problems possess only the input variables (X) but no corresponding output variables. It uses unlabeled training data to model the underlying structure of the data.
Association Rule Learning
Nota:
To discover the probability of the co-occurrence of items in a collection. It is extensively used in market-basket analysis. Example: If a customer purchases bread, he is 80% likely to also purchase eggs.
Clustering
Nota:
To group samples such that objects within the same cluster are more similar to each other than to the objects from another cluster.
Dimensionality Reduction
Nota:
True to its name, Dimensionality Reduction means reducing the number of variables of a dataset while ensuring that important information is still conveyed. Dimensionality Reduction can be done using Feature Extraction methods and Feature Selection methods. Feature Selection selects a subset of the original variables. Feature Extraction performs data transformation from a high-dimensional space to a low-dimensional space.
Reinforcement Learning
Nota:
Reinforcement learning is a type of machine learning algorithm that allows the agent to decide the best next action based on its current state, by learning behaviours that will maximize the reward.
Reinforcement algorithms usually learn optimal actions through trial and error. They are typically used in robotics – where a robot can learn to avoid collisions by receiving negative feedback after bumping into obstacles, and in video games – where trial and error reveals specific movements that can shoot up a player’s rewards. The agent can then use these rewards to understand the optimal state of game play and choose the next action.