Monday, June 29, 2026
HomeData ScienceLinear Regression in Python: Complete Guide with Examples (2026)

Linear Regression in Python: Complete Guide with Examples (2026)

Table of Content

Linear regression is the foundation of machine learning. Understanding it deeply makes every other algorithm easier to learn. This guide covers simple regression, multiple regression, assumptions, and evaluation in Python.

What is Linear Regression?

Linear regression models the linear relationship between a dependent variable (what you predict) and one or more independent variables (features). It fits a line that minimises prediction error across all data points.

Simple Linear Regression in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)
X = np.random.uniform(500, 3000, 200).reshape(-1, 1)
y = 50 + 0.15 * X.flatten() + np.random.normal(0, 20, 200)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

print(f'Slope: {model.coef_[0]:.4f}')
print(f'Intercept: {model.intercept_:.2f}')
print(f'R-squared: {r2_score(y_test, model.predict(X_test)):.4f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_test, model.predict(X_test))):.2f}')

plt.scatter(X_test, y_test, alpha=0.5, label='Actual')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted')
plt.legend()
plt.title('Linear Regression')
plt.show()

Multiple Linear Regression

import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'size': [1000,1500,2000,1200,1800,900,2200,1600],
    'bedrooms': [2,3,4,2,3,1,4,3],
    'age_years': [10,5,2,15,8,20,1,6],
    'price_lakh': [45,68,95,50,82,35,110,75]
})

X = df[['size','bedrooms','age_years']]
y = df['price_lakh']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

model = LinearRegression().fit(X_scaled, y)
for feat, coef in zip(X.columns, model.coef_):
    print(f'{feat:12}: {coef:+.4f}')

Key Assumptions

  • Linearity — X and y have a linear relationship
  • Independence — observations are independent
  • Homoscedasticity — residuals have constant variance
  • Normality — residuals are normally distributed
  • No multicollinearity — features not highly correlated

Ridge and Lasso (Regularised Regression)

from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0).fit(X_train, y_train)
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

print('Ridge R2:', r2_score(y_test, ridge.predict(X_test)).round(4))
print('Lasso R2:', r2_score(y_test, lasso.predict(X_test)).round(4))

FAQ

When should I use Ridge vs Lasso?

Ridge (L2) when all features matter but you want to reduce coefficient magnitude. Lasso (L1) when you want automatic feature selection — it drives some coefficients to exactly zero.

What is a good R-squared?

Depends on domain. Physical sciences expect R2 above 0.95. Business/social science: 0.6-0.8 is often acceptable. Always benchmark against a simple baseline model.

Leave feedback about this

  • Rating

Latest Posts

List of Categories