Logistic Regression Theory & Digit Recognition Example

Categories: Machine Learning

Tags:

Logistic Regression Theory & Digit Recognition Example

Logistic Regression: A Quick Overview

Logistic Regression is a statistical and machine learning method used for binary classification problems, where the target variable has two possible outcomes (e.g., yes/no, 0/1, spam/not spam). It can also be extended to multiclass classification.

Core Concepts

Purpose:
- Predict the probability of the target variable belonging to a particular class.
- Output probabilities are converted to class labels using a threshold (commonly 0.5).
Key Assumption:
- The relationship between the independent variables (XX) and the log-odds of the dependent variable (yy) is linear.
Sigmoid Function:
- Logistic regression uses the sigmoid function to map predicted values to probabilities (00 to 11): σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}} Where z=β0+β1X1+β2X2+⋯+βnXnz = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_n.
Log-Odds and Decision Boundary:
- The log-odds (logp1−plog \frac{p}{1-p}) are modeled as a linear combination of predictors.
- A decision boundary is set (e.g., p>0.5p > 0.5) to assign a class label.
Loss Function:
- Logistic regression uses the log-loss (negative log-likelihood) to optimize model parameters: Log-Loss=−1N∑i=1N[yilog⁡(y^i)+(1−yi)log⁡(1−y^i)]\text{Log-Loss} = – \frac{1}{N} \sum_{i=1}^N \left[y_i \log(\hat{y}_i) + (1 – y_i) \log(1 – \hat{y}_i)\right]

Steps in Logistic Regression

Fit the Model:
- Use maximum likelihood estimation to find the parameters (β\beta) that best fit the data.
Predict:
- Compute the probability using the sigmoid function.
- Assign labels based on the threshold.
Evaluate:
- Use metrics like accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC-ROC) to evaluate performance.

Advantages

Simple to implement and interpret.
Works well for linearly separable data.
Provides probabilistic outputs.

Limitations

Assumes linearity in the log-odds.
Not ideal for complex, non-linear relationships without transformation or interaction terms.
Can struggle with multicollinearity and imbalanced datasets.

Would you like an example or explanation of logistic regression applications?

The given array represents a confusion matrix for a multi-class classification problem with 10 classes (numbered 0 through 9). Each row corresponds to the actual class, and each column corresponds to the predicted class.

How to Interpret the Matrix

Diagonal Elements:
- The diagonal elements (e.g., [31, 47, 34, ..., 26]) represent the cases where the actual class matches the predicted class (correct predictions).
- For instance, 31 in the first row and first column indicates that 31 samples of class 0 were correctly classified as class 0.
Off-Diagonal Elements:
- These elements represent misclassifications. For example:
  - The value 1 at position [0][5] means 1 sample of class 0 was incorrectly classified as class 5.
  - The value 4 at position [3][8] means 4 samples of class 3 were incorrectly classified as class 8.

Breakdown of the Matrix

Class 0 (Row 0):
- 31 samples were correctly predicted as class 0.
- 1 sample was misclassified as class 5.
Class 1 (Row 1):
- 47 samples were correctly predicted as class 1.
- 1 sample was misclassified as class 8.
Class 2 (Row 2):
- 34 samples were correctly predicted as class 2.
- 1 sample was misclassified as class 3.
- 1 sample was misclassified as class 8.
Class 3 (Row 3):
- 35 samples were correctly predicted as class 3.
- 1 sample was misclassified as class 7.
- 4 samples were misclassified as class 8.
- 2 samples were misclassified as class 9.

And so on for the remaining rows.

Total Number of Predictions

To calculate the total number of predictions:

Sum of all elements in the matrix: This represents the total number of predictions made by the model (including correct and incorrect predictions).

Metrics That Can Be Derived

From this matrix, you can compute various metrics for each class:

True Positives (TP): The diagonal element for the class.
False Positives (FP): Sum of the column for the class minus the diagonal element.
False Negatives (FN): Sum of the row for the class minus the diagonal element.
True Negatives (TN): Sum of all elements in the matrix minus TP, FP, and FN for the class.

Using these, you can compute metrics like:

Precision
Recall
F1-Score
Accuracy

Would you like help calculating these metrics for a specific class or the entire dataset?