Machine Learning Basics Answered!
In this article we will answer few basic questions in Machine Learning(ML) for beginners to get started! This article is meant to be concise for quick reference.
1. What is Machine Learning?
The formal definition of Machine Learning is as follows
Machine learning is the science of getting computers to act without being explicitly programmed
This basically means that an algorithm gets better with experience without requiring human intervention to explicitly write code for it.
2. What are the types of Machine Learning Algorithms?
There are 3 types of ML algorithms
1. Supervised Learning:
This type of learning is for labelled data. Supervised learning is the process where the input data is provided along with the output data so that the model can learn the pattern/relationship between them to predict the output for new and unknown data. This type of learning, though widely used, is usually not found in real world data. Most of the real world data falls into the second category.
Examples of Supervised Learning include Linear Regression, Logistic Regression, Decision Trees etc.
2. Unsupervised Learning:
This type of Learning is usually found in real world scenarios. In this type of learning, the data is not “labelled”. Since the output data is not provided, the relationship among the input and the output is unknown.
Examples of Unsupervised Learning includes clustering techniques.
3. Reinforcement Learning:
It is the process of rewarding the algorithm when it takes the right decision or giving it penalties when it takes a wrong decision. Based on the award/penalty given, the algorithm makes its next move.
3. What is Overfitting and how to handle it?
Overfitting is when a model takes every feature into consideration. This is almost as if the model is “memorizing” the features. One way to detect overfitting is by analyzing the model’s performance in test and train set. A model that suffers from overfitting generally performs very well in the train set and suffers in the test set. This is because the data in the test set is new and unknown whereas in the training set, it has already “memorized” the values during training. Overfitting is also called as Variance.
Few things can be done to handle overfitting:
- Adding more data
- regularization
- In Neural Networks, reducing the number of layers or using dropout technique helps.
4. Split ratio of data (Train, Cross-Validation, Test)
Before the era of big data, the split used to be 60% training set, 20% cross-validation set and 20% test set. But, due to the substantial increase in the amount of data, which is in the order of millions, the split ratio has also changed quite a bit. Now the ratio can be 98% training set, 1% cross-validation set and 1% test set.
To conclude, I hope this article helps answer few questions about ML and let me know if you want part 2!