Week 2: Loss functions, back propagation and initialisation

Reading

The mandatory reading material for the week is

Chapters 5 (except 5.7), 6, and 7 in the Prince book.

Exercises

During the exercise session, we will work on

Notebook: 2.1 FNN AutoDif Nanograd.ipynb
Problems (Prince): 5.9, 6.5, 7.10

Slides

02456 Week 2.pdf

Notes

Week 2 – Learning (02456 DTU)

📚 Focus of the Week • Lecture topics (Prince ch. 5–7): • Loss functions • Fitting models • Gradients & initialization • Exercises: • Notebook: 2.1 FNN AutoDiff Nanograd.ipynb → build your own autodiff engine. • Problems: 5.9, 6.5, 7.10.

⸻

Part 1: Loss Functions • Training objective: minimize mismatch between prediction f_\phi(x) and target y. • Define loss function L(\phi).

Maximum Likelihood Estimation (MLE) • Framework: choose parameters that maximize probability of observed data: \phi^* = \arg\max_\phi \prod_i p(y_i|x_i, \phi) • Equivalent to minimizing negative log-likelihood (NLL). • Advantages: • Consistency (converges to true params as n \to \infty). • Efficiency (lowest possible variance asymptotically).

Common Cases

Regression (Gaussian likelihood): • y \sim \mathcal{N}(\mu=f_\phi(x), \sigma^2). • Loss → Mean Squared Error (MSE).
Classification (Categorical likelihood): • y \sim \text{Cat}(\pi=f_\phi(x)), where \pi = softmax outputs. • Loss → Cross-entropy.
Multiple outputs: losses combine additively when independent.

⸻

Part 2: Fitting Models