Let be labeled training data. The data is said to be linearly separable if there exists a hyperplane that correctly classifies all the examples :
In general, finding is impossible, but we search for some that separates the 2 classes. The objective function to optimize is :
This is called a batch objective since it relies on a cumulative fit to data. By our assumption : . There are however many solution hyperplanes if we consider scaling of . To solve this, we fix to be the smallest-norm vector that guarantees :
How can we solve this? Using the Perceptron recursive update. It has been shown that the perceptron converges to a solution in a finite number of steps.
The algorithm of the Perceptron is the following :
Conclusion : That’s it ! I hope this introduction to Online Learning was clear. Don’t hesitate to drop a comment if you have any question.
Like it? Buy me a coffee