Why are the contours of the l1 loss squares and the contours of the l2 loss circles?
Answers
Answered by
0
With a sparse model, we think of a model where many of the weights are 0. Let us therefore reason about how L1-regularization is more likely to create 0-weights.
Consider a model consisting of the weights (w1,w2,…,wm)(w1,w2,…,wm).
With L1 regularization, you penalize the model by a loss function L1(w)L1(w) = Σi|wi|Σi|wi|.
With L2-regularization, you penalize the model by a loss function L2(w)L2(w) = 12Σiw2i12Σiwi2
If using gradient descent, you will iteratively make the weights change in the opposite direction of the gradient with a step size ηηmultiplied with the gradient. This means that a more steep gradient will make us take a larger step, while a more flat gradient will make us take a smaller step. Let us look at the gradien
Similar questions