Math, asked by jahanvi9183, 11 months ago

Why are the contours of the l1 loss squares and the contours of the l2 loss circles?

Answers

Answered by subhalakshmi286

With a sparse model, we think of a model where many of the weights are 0. Let us therefore reason about how L1-regularization is more likely to create 0-weights.

Consider a model consisting of the weights (w1,w2,…,wm)(w1,w2,…,wm).

With L1 regularization, you penalize the model by a loss function L1(w)L1(w) = Σi|wi|Σi|wi|.

With L2-regularization, you penalize the model by a loss function L2(w)L2(w) = 12Σiw2i12Σiwi2

If using gradient descent, you will iteratively make the weights change in the opposite direction of the gradient with a step size ηηmultiplied with the gradient. This means that a more steep gradient will make us take a larger step, while a more flat gradient will make us take a smaller step. Let us look at the gradien

Previous Question

Next Question