The Ingredients of Artificial Neural Networks

image.png

To build and train a modern Deep Learning model, we need four primary ingredients:

  1. Data (D): Observed features X and targets Y.
  2. The Objective (L): A loss function to minimize, usually derived from the Negative Log-Likelihood.
  3. The Engine (∇): An optimizer (Gradient Descent) and an algorithm to find gradients (Backpropagation).
  4. Architecture (fθ): The choice of non-linear function approximator (MLPs, CNNs, Transformers).

Going downhill or Gradient Descent

image.png

Gradient Descent Algorithm

  1. Initialize: Choose an initial guess for the parameters w.
  2. Compute Gradient: Calculate the gradient of the objective function L(w) with respect to the parameters, denoted as ∇L(w).
  3. Update Parameters: Update the parameters in the direction opposite to the gradient:

$w = w−η·∇L(w)$

where η is the learning rate, controlling the size of the steps.