$$ y_{new} = sign(β_0 + x_{new}β) $$
For practical reasons, we label the two classes 1 and −1.
Many hyperplanes can separate the two classes
What would be optimal??!
Introducing the ’margin’
Maximize the distance C (margin) from the decision line to the nearest points in each class.
There is no probabilistic model here as we have for linear discriminant analysis and logistic regression


The vector β is orthogonal (perpendicular) to the hyperplane
To find the distance, we need a unit normal vector (a vector in the same direction but with length 1). We define this as:
$$ \hat n=\frac{\beta}{||\beta||} $$
The vector n gives us the direction of the shortest path to the plane
