Logistic regression is considered to be one of the simplest classifiers (alongside k-nearest neighbors). It can be seen as interpreting the output of a linear regression model as the log-odds (logit) of the probability that given the data.

Recall that linear regression is defined as

where and have been augmented to incorporate the bias term. If we assume that is the log odds that , then we have

\mathbf{p} = \sigma(X\boldsymbol{\theta}) = \frac{1}{1 + e^{-X\boldsymbol{\theta}}}.

\hat{\mathbf{y}} = \frac{1}{1 + e^{-X\boldsymbol{\theta}}}.

You can't use 'macro parameter character #' in math mode# Optimizing logistic regression Unlike [[Linear regression|linear regression]], there is no general closed-form solution for logistic regression. Instead, we typically use [[Gradient descent|gradient descent]]. To do this, we will need to find the gradient of the loss with respect to the parameters, ie, $\frac{\partial L}{\partial \boldsymbol{\theta}}$. When optimizing logistic regression, we typically use [[02 Binary cross-entropy loss|binary cross-entropy loss]]. Recall that, for one example, this is defined as

L(\mathbf{y}, \hat{\mathbf{y}}) = -\mathbf{y} \cdot \mathrm{log}(\hat{\mathbf{y}}) - (\mathbf{1}-\mathbf{y})\cdot \log(\mathbf{1}-\hat{\mathbf{y}}).

\frac{\partial L}{\partial \boldsymbol{\theta}} = \frac{\partial L}{\partial\hat{\mathbf{y}}} \frac{\partial\hat{\mathbf{y}}}{\partial \mathbf{z}} \frac{\partial \mathbf{z}}{\partial \boldsymbol{\theta}}.

\frac{\partial L}{\partial \boldsymbol{\theta}} = X^{T}(\hat{\mathbf{y}}-\mathbf{y}).

You can't use 'macro parameter character #' in math modeSo we can implement our solver as ```python def fit_logistic_regression(X: np.array, y: np.array, lr: float, epochs: int) -> np.array: y = y.reshape(-1, 1) ones_column: np.array = np.ones((X.shape[0], 1)) X_b: np.array = np.concatenate((X, ones_column), axis=1) # bias column theta: np.array = np.random.randn(X_b.shape[1], 1) for _ in range(epochs): y_pred: np.array = 1 / (1 + np.exp(-X_b @ theta)) gradients: np.array = X_b.T @ (y_pred - y) theta -= lr * gradients return theta ``` # Implementing in PyTorch If we're allowed to use a library like [[PyTorch]], this job becomes much easier thanks to [[Autograd (PyTorch)|Autograd]]: ```python class LogisticRegression(nn.Module): def __init__(self, dim: int) -> None: super(LogisticRegression, self).__init__() self.weights: nn.Parameter = nn.Parameter(torch.randn(dim+1, 1)) def forward(self, x: torch.Tensor) -> torch.Tensor: ones_column: torch.Tensor = torch.ones((x.shape[0], 1)) X_b: torch.Tensor = torch.cat((x, ones_column), axis=1) return torch.sigmoid(X_b @ self.weights) ```