1D Stochastic Gradient Descent

Target function

Dataset

N=100

Training

iters=200
i=200

SGD update rule


    

Data + Lines (scrub iterations)

Slope w over time

Bias b over time