The Ingredients of Artificial Neural Networks

To build and train a modern Deep Learning model, we need four primary ingredients:
- Data (D): Observed features X and targets Y.
- The Objective (L): A loss function to minimize, usually derived from the Negative
Log-Likelihood.
- The Engine (∇): An optimizer (Gradient Descent) and an algorithm to find gradients
(Backpropagation).
- Architecture (fθ): The choice of non-linear function approximator (MLPs, CNNs, Transformers).
Going downhill or Gradient Descent

- Gradient Descent is an optimization
algorithm.
- Used to minimize a function L(w) by
iteratively moving in the direction of steepest
descent.
Gradient Descent Algorithm
- Initialize: Choose an initial guess for the
parameters w.
- Compute Gradient: Calculate the gradient of
the objective function L(w) with respect to the
parameters, denoted as ∇L(w).
- Update Parameters: Update the parameters in
the direction opposite to the gradient:
$w = w−η·∇L(w)$
where η is the learning rate, controlling the size
of the steps.