Pandas is the most important Python library for data analysis. Every data scientist uses it daily. This complete tutorial takes you from zero to proficient in Pandas — with real code examples you can run immediately.

What is Pandas?

Pandas is an open-source Python library for data manipulation and analysis. It provides two primary data structures: the DataFrame (like a table/spreadsheet) and the Series (a single column). With Pandas, you can load, clean, transform, and analyze data with just a few lines of code.

Installing Pandas

pip install pandas numpy matplotlib

Creating DataFrames

import pandas as pd
import numpy as np

# From a dictionary
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [28, 34, 25, 31],
    'salary': [65000, 82000, 55000, 90000],
    'department': ['Data', 'Engineering', 'Data', 'Product']
}
df = pd.DataFrame(data)
print(df)

Reading Data from Files

# CSV
df = pd.read_csv('data.csv')

# Excel
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# JSON
df = pd.read_json('data.json')

# From SQL database
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://user:pass@host/db')
df = pd.read_sql('SELECT * FROM customers', engine)

Exploring Your Data

print(df.head())           # First 5 rows
print(df.tail())           # Last 5 rows
print(df.shape)            # (rows, columns)
print(df.dtypes)           # Column data types
print(df.info())           # Overview + null counts
print(df.describe())       # Statistical summary
print(df.columns.tolist()) # Column names

Selecting Data

# Select one column (returns Series)
names = df['name']

# Select multiple columns (returns DataFrame)
subset = df[['name', 'salary']]

# Select by row index (iloc)
first_row = df.iloc[0]        # First row
rows_1_to_3 = df.iloc[1:4]   # Rows 1, 2, 3

# Select by label (loc)
row = df.loc[df['name'] == 'Alice']

# Filter rows with conditions
high_earners = df[df['salary'] > 70000]
data_team = df[df['department'] == 'Data']
data_high_earners = df[(df['department'] == 'Data') & (df['salary'] > 60000)]

Cleaning Data

# Check missing values
print(df.isnull().sum())

# Drop rows with any missing values
df_clean = df.dropna()

# Fill missing values
df['age'] = df['age'].fillna(df['age'].median())
df['department'] = df['department'].fillna('Unknown')

# Remove duplicates
df = df.drop_duplicates()

# Rename columns
df = df.rename(columns={'name': 'employee_name', 'salary': 'annual_salary'})

# Change data types
df['age'] = df['age'].astype(int)
df['hire_date'] = pd.to_datetime(df['hire_date'])

Transforming Data

# Add new columns
df['salary_monthly'] = df['salary'] / 12
df['senior'] = df['age'].apply(lambda x: 'Senior' if x >= 30 else 'Junior')

# Apply function to column
df['name_upper'] = df['name'].str.upper()
df['name_length'] = df['name'].str.len()

# Replace values
df['department'] = df['department'].replace({'Data': 'Data Science', 'Engineering': 'Engineering'})

# Sorting
df_sorted = df.sort_values('salary', ascending=False)
df_sorted2 = df.sort_values(['department', 'salary'], ascending=[True, False])

Grouping and Aggregation

# Group by department and calculate stats
dept_stats = df.groupby('department').agg({
    'salary': ['mean', 'min', 'max', 'count'],
    'age': 'mean'
}).round(2)
print(dept_stats)

# Simple group by
avg_salary = df.groupby('department')['salary'].mean()

# Multiple aggregations
result = df.groupby('department').agg(
    avg_salary=('salary', 'mean'),
    count=('name', 'count'),
    max_age=('age', 'max')
).reset_index()

Merging and Joining DataFrames

# Sample data
employees = pd.DataFrame({'emp_id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
salaries = pd.DataFrame({'emp_id': [1, 2, 4], 'salary': [65000, 82000, 90000]})

# Inner join (only matching rows)
result = pd.merge(employees, salaries, on='emp_id', how='inner')

# Left join (all from left, matching from right)
result = pd.merge(employees, salaries, on='emp_id', how='left')

# Stack DataFrames vertically
combined = pd.concat([df1, df2], ignore_index=True)

Pivot Tables

# Like Excel pivot tables
pivot = df.pivot_table(
    values='salary',
    index='department',
    columns='senior',
    aggfunc='mean',
    fill_value=0
).round(2)
print(pivot)

Exporting Data

# To CSV
df.to_csv('output.csv', index=False)

# To Excel
df.to_excel('output.xlsx', sheet_name='Results', index=False)

# To JSON
df.to_json('output.json', orient='records')

# Multiple sheets in Excel
with pd.ExcelWriter('report.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Summary', index=False)
    df2.to_excel(writer, sheet_name='Details', index=False)

FAQ

What is pandas used for in data science?

Pandas is used for loading data from files and databases, cleaning messy data, transforming and reshaping datasets, calculating statistics, merging multiple data sources, and preparing data for machine learning.

Should I learn NumPy before Pandas?

You can learn them simultaneously. Pandas is built on NumPy, so knowing NumPy basics helps, but you can become productive in Pandas without deep NumPy knowledge.

How do I speed up slow Pandas operations?

Use vectorized operations instead of loops, use .apply() with caution, consider swifter or modin for parallel processing, and switch to Polars for very large datasets (it’s 5-10x faster than Pandas).

Pandas Tutorial: Complete Guide to Python Data Analysis (2026)

Table of Content

What is Pandas?

Installing Pandas

Creating DataFrames

Reading Data from Files

Exploring Your Data

Selecting Data

Cleaning Data

Transforming Data

Grouping and Aggregation

Merging and Joining DataFrames

Pivot Tables

Exporting Data

FAQ

What is pandas used for in data science?

Should I learn NumPy before Pandas?

How do I speed up slow Pandas operations?

Leave feedback about this Cancel Reply

Latest Posts

Data Science Interview Questions 2026 — With Answers (SQL, Python, ML)

Best AI Tools for Data Science in 2026 — Complete Ranked List

How to Use ChatGPT for Data Analysis — 10 Practical Methods (2026)

List of Categories

About us

Categories

The latest

Data Science Interview Questions 2026 — With Answers (SQL, Python, ML)

Best AI Tools for Data Science in 2026 — Complete Ranked List

How to Use ChatGPT for Data Analysis — 10 Practical Methods (2026)

Subscribe

Data Security: The Imperative of Data Backup and Recovery

UrbanObserver

Subscribe to newsletter

Pandas Tutorial: Complete Guide to Python Data Analysis (2026)

Table of Content

What is Pandas?

Installing Pandas

Creating DataFrames

Reading Data from Files

Exploring Your Data

Selecting Data

Cleaning Data

Transforming Data

Grouping and Aggregation

Merging and Joining DataFrames

Pivot Tables

Exporting Data

FAQ

What is pandas used for in data science?

Should I learn NumPy before Pandas?

How do I speed up slow Pandas operations?

Leave feedback about this Cancel Reply

Latest Posts

Data Science Interview Questions 2026 — With Answers (SQL, Python, ML)

Best AI Tools for Data Science in 2026 — Complete Ranked List

How to Use ChatGPT for Data Analysis — 10 Practical Methods (2026)

List of Categories

About us

Categories

The latest

Data Science Interview Questions 2026 — With Answers (SQL, Python, ML)

Best AI Tools for Data Science in 2026 — Complete Ranked List

How to Use ChatGPT for Data Analysis — 10 Practical Methods (2026)

Subscribe

Data Security: The Imperative of Data Backup and Recovery