L1 regularization is a regularization technique applicable to many methods in statistical learning. It consists of applying a penalty to the loss function proportional to the sum of the absolute value of something. In parametric models, the “something” is parameters. It is an “L1” regularization because it uses the Manhattan distance (i.e., the
L1 regularization promotes model sparsity by pushing parameter weights towards zero. It can help with collinearity, but may do so by completely eliminating a feature that has meaningful information. It is most suitable when one suspects that a small set of features explain most of the variance in the response variable. L1 and L2 regularization can be used together; this is often a good approach in linear models.
In non-parametric models, L1 regularization has a case-by-case interpretation, and is not always applicable. For example, there is no straightforward way to use L1 regularization in a k-NN regression or a single decision tree. However, gradient boosting models do have a form of L1 regularization. The parameter, called
L1 regularization is often treated as synonymous with Least Absolute Shrinkage and Selection Operator (LASSO), though technically they are not exactly the same thing: LASSO refers to a particular implementation of L1 regularization, whereas “L1 regularization” refers to any regularizer based on the sum of the absolute value of something.