The Ingredients of Artificial Neural Networks

To build and train a modern Deep Learning model, we need four primary ingredients:

Data (D): Observed features X and targets Y.
The Objective (L): A loss function to minimize, usually derived from the Negative Log-Likelihood.
The Engine (∇): An optimizer (Gradient Descent) and an algorithm to find gradients (Backpropagation).
Architecture (fθ): The choice of non-linear function approximator (MLPs, CNNs, Transformers).

Going downhill or Gradient Descent

Gradient Descent is an optimization algorithm.
Used to minimize a function L(w) by iteratively moving in the direction of steepest descent.

Gradient Descent Algorithm

Initialize: Choose an initial guess for the parameters w.
Compute Gradient: Calculate the gradient of the objective function L(w) with respect to the parameters, denoted as ∇L(w).
Update Parameters: Update the parameters in the direction opposite to the gradient:

$w = w−η·∇L(w)$

where η is the learning rate, controlling the size of the steps.