The decision function

$$ y_{new} = sign(β_0 + x_{new}β) $$

For practical reasons, we label the two classes 1 and −1.

Many hyperplanes can separate the two classes

What would be optimal??!

Introducing the ’margin’

Maximize the distance C (margin) from the decision line to the nearest points in each class.

There is no probabilistic model here as we have for linear discriminant analysis and logistic regression

image.png

image.png

Step 1: The normal vector

The vector β is orthogonal (perpendicular) to the hyperplane

To find the distance, we need a unit normal vector (a vector in the same direction but with length 1). We define this as:

$$ \hat n=\frac{\beta}{||\beta||} $$

The vector n gives us the direction of the shortest path to the plane

image.png