Linear regression is the foundation of machine learning. Understanding it deeply makes every other algorithm easier to learn. This guide covers simple regression, multiple regression, assumptions, and evaluation in Python.

What is Linear Regression?

Linear regression models the linear relationship between a dependent variable (what you predict) and one or more independent variables (features). It fits a line that minimises prediction error across all data points.

Simple Linear Regression in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)
X = np.random.uniform(500, 3000, 200).reshape(-1, 1)
y = 50 + 0.15 * X.flatten() + np.random.normal(0, 20, 200)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

print(f'Slope: {model.coef_[0]:.4f}')
print(f'Intercept: {model.intercept_:.2f}')
print(f'R-squared: {r2_score(y_test, model.predict(X_test)):.4f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_test, model.predict(X_test))):.2f}')

plt.scatter(X_test, y_test, alpha=0.5, label='Actual')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted')
plt.legend()
plt.title('Linear Regression')
plt.show()

Multiple Linear Regression

import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'size': [1000,1500,2000,1200,1800,900,2200,1600],
    'bedrooms': [2,3,4,2,3,1,4,3],
    'age_years': [10,5,2,15,8,20,1,6],
    'price_lakh': [45,68,95,50,82,35,110,75]
})

X = df[['size','bedrooms','age_years']]
y = df['price_lakh']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

model = LinearRegression().fit(X_scaled, y)
for feat, coef in zip(X.columns, model.coef_):
    print(f'{feat:12}: {coef:+.4f}')

Key Assumptions

Linearity — X and y have a linear relationship
Independence — observations are independent
Homoscedasticity — residuals have constant variance
Normality — residuals are normally distributed
No multicollinearity — features not highly correlated

Ridge and Lasso (Regularised Regression)

from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0).fit(X_train, y_train)
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

print('Ridge R2:', r2_score(y_test, ridge.predict(X_test)).round(4))
print('Lasso R2:', r2_score(y_test, lasso.predict(X_test)).round(4))

FAQ

When should I use Ridge vs Lasso?

Ridge (L2) when all features matter but you want to reduce coefficient magnitude. Lasso (L1) when you want automatic feature selection — it drives some coefficients to exactly zero.

What is a good R-squared?

Depends on domain. Physical sciences expect R2 above 0.95. Business/social science: 0.6-0.8 is often acceptable. Always benchmark against a simple baseline model.

Linear Regression in Python: Complete Guide with Examples (2026)

Table of Content

What is Linear Regression?

Simple Linear Regression in Python

Multiple Linear Regression

Key Assumptions

Ridge and Lasso (Regularised Regression)

FAQ

When should I use Ridge vs Lasso?

What is a good R-squared?

Leave feedback about this Cancel Reply

Latest Posts

Logistic Regression in Python: Complete Classification Guide (2026)

Data Cleaning in Python: Complete Pandas Guide (2026)

SQL Window Functions: Complete Guide with Real Examples (2026)

List of Categories

About us

Categories

The latest

Logistic Regression in Python: Complete Classification Guide (2026)

Data Cleaning in Python: Complete Pandas Guide (2026)

SQL Window Functions: Complete Guide with Real Examples (2026)

Subscribe

Machine Learning Security: The Ultimate Power Guide to Protecting AI Systems

UrbanObserver

Subscribe to newsletter

Linear Regression in Python: Complete Guide with Examples (2026)

Table of Content

What is Linear Regression?

Simple Linear Regression in Python

Multiple Linear Regression

Key Assumptions

Ridge and Lasso (Regularised Regression)

FAQ

When should I use Ridge vs Lasso?

What is a good R-squared?

Leave feedback about this Cancel Reply

Latest Posts

Logistic Regression in Python: Complete Classification Guide (2026)

Data Cleaning in Python: Complete Pandas Guide (2026)

SQL Window Functions: Complete Guide with Real Examples (2026)

List of Categories

About us

Categories

The latest

Logistic Regression in Python: Complete Classification Guide (2026)

Data Cleaning in Python: Complete Pandas Guide (2026)

SQL Window Functions: Complete Guide with Real Examples (2026)

Subscribe

Machine Learning Security: The Ultimate Power Guide to Protecting AI Systems