




Minimising cost functions will minimize weights.

Three different datasets. Diffferent lambdas
First row λ=0 Goes through all points but jiggles a lot
Higher λ gives simpler models, but fits the points worse. The correct λ is somewhere inbetween accuracy and simplicity. We must not overfit.