Support Vector Machines (SVM) are powerful machine learning algorithms used for classification and regression. They work by finding the optimal boundary (hyperplane) that best separates different classes of data with maximum margin.
Think of SVM as drawing the best possible line (or curve) between two groups of points, ensuring that the line is as far as possible from the nearest points on both sides. This makes SVMs very effective at creating robust decision boundaries.
The decision boundary that separates different classes. In 2D, it's a line. In 3D, it's a plane. In higher dimensions, it's called a hyperplane.
The data points closest to the hyperplane. These are the critical points that define the position and orientation of the hyperplane. Only these points matter for finding the optimal boundary!
The distance between the hyperplane and the nearest support vectors. SVM tries to maximize this margin, creating a "safety buffer" that makes the classifier more robust to new data.
A technique to handle non-linearly separable data. The kernel transforms data into a higher dimension where a linear separation becomes possible, without actually computing the transformation explicitly.
Start with labeled training data. Each point belongs to one of two classes (e.g., red or blue).
There are many possible lines that could separate the classes. SVM needs to find the best one.
SVM chooses the hyperplane that has the maximum distance to the nearest points of both classes. This creates the widest "street" between the classes.
The points that lie on the margin boundaries are the support vectors. These are the only points that matter - you could remove all other points and still get the same hyperplane!
If the data isn't linearly separable, apply a kernel function to transform it into a higher dimension where linear separation is possible.
Formula: K(x, y) = x · y
Best for linearly separable data. Fast and simple. Use when your data can be separated by
a straight line (or hyperplane in higher dimensions).
Formula: K(x, y) = (x · y + c)^d
Handles curved decision boundaries. The degree (d) controls the flexibility. Good for
data with polynomial relationships.
Formula: K(x, y) = exp(-γ||x - y||²)
Most popular kernel for non-linear data. Can handle complex, circular, or irregular
decision boundaries. Very flexible and powerful.
SVM Optimization Problem: Maximize: margin = 2 / ||w|| Subject to: y_i(w·x_i + b) ≥ 1 for all training points Where: - w = weight vector (defines hyperplane orientation) - b = bias (defines hyperplane position) - y_i = class label (-1 or +1) - x_i = feature vector of data point i The hyperplane equation: w·x + b = 0 Decision function: sign(w·x + b) - If w·x + b > 0 → Class +1 - If w·x + b < 0 → Class -1
Classifying medical images (cancer vs. non-cancer), predicting disease risk based on patient features, and identifying protein structures.
Email filtering systems use SVMs to classify messages as spam or legitimate based on features like keywords, sender information, and patterns.
Identifying individuals from facial features. SVMs can handle high-dimensional feature spaces effectively, making them ideal for face recognition.
Categorizing documents, sentiment analysis, topic detection, and language identification based on text features.
Object detection, scene recognition, and image segmentation using pixel features and pattern recognition.
Stock market prediction, credit scoring, fraud detection, and risk assessment based on financial indicators.