Monday, June 29, 2026
HomeData ScienceLogistic Regression in Python: Complete Classification Guide (2026)

Logistic Regression in Python: Complete Classification Guide (2026)

Table of Content

Logistic regression is one of the most widely used classification algorithms. Despite the name, it is a classification model — not regression. It predicts the probability of class membership using the sigmoid function. This guide covers it completely.

How Logistic Regression Works

Logistic regression applies a sigmoid function to a linear combination of features, mapping output to a probability between 0 and 1. A threshold (usually 0.5) converts the probability to a binary class label.

Binary Classification Example

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
print(f'AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}')

fpr, tpr, _ = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, label=f'AUC = {roc_auc_score(y_test, y_prob):.3f}')
plt.plot([0,1],[0,1],'k--')
plt.xlabel('False Positive Rate'); plt.ylabel('True Positive Rate')
plt.title('ROC Curve'); plt.legend(); plt.show()

Multiclass Classification

from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(multi_class='auto', max_iter=500)
model.fit(X_train, y_train)
print(f'Accuracy: {model.score(X_test, y_test):.4f}')
print(classification_report(y_test, model.predict(X_test)))

Important Hyperparameters

ParameterOptionsWhen to use
C0.001 to 100Lower = more regularisation
penaltyl1, l2, elasticnetl1 for feature selection
solverlbfgs, saga, liblinearsaga for l1 + large datasets
class_weightbalanced, Nonebalanced for imbalanced data

FAQ

Does logistic regression need feature scaling?

Yes — always standardise features. Logistic regression uses gradient-based optimisation which converges much faster and more reliably with scaled features.

Logistic regression vs SVM vs Random Forest?

Use logistic regression when you need probability outputs and interpretability. Random Forest when accuracy is the priority. SVM for high-dimensional data with clear margins.

Leave feedback about this

  • Rating

Latest Posts

List of Categories