Sunday 17 March 2024

Common Pandas Operations with Examples


Popular Pandas Operations with Examples

If you want to use Pandas in your data science project, then learn followings common pandas operations:

  1. Creating Data
  2. Accessing Data
  3. Data Selection
  4. Data Manipulation
  5. Data Aggregation and Summarization
Let's explore these operations with examples:

1. Creating Data

Let's create Series and DataFrame type of variables to apply various Pandas Operations on it.  

Creating a Series

import pandas as pd

# Prepare data variable
data = [1, 2, 3, 4, 5]

# Creating a Series
my_series = pd.Series(data)

# Print a Series
print(my_series)

Output:

0 1
1 2
2 3
3 4
4 5
dtype: int64
  
Creating a DataFrame

# Create dictionaries for each column
data = {
    'Name': ['Kuntal', 'Sunit', 'Tapan', 'Amit'],
    'Age': [40, 38, 42, 44],
    'Bank': ['SBI', 'HDFC', 'ICICI', 'KOTAK']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

Output:

Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK


2. Accessing Data

Popular way to access data are:
  • .head() : To view the first few rows.
  • .tail() : To view the last few rows.
  • [ ] : Access columns or rows by labels or positions (similar to list indexing).
Examples:

# To view first 2 rows from DataFrame "df"

df.head(2)

Output:

Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC

# To print last 2 rows from DataFrame "df"

df.tail(2)

Output:

    Name    Age Bank
2   Tapan   42  ICICI
3   Amit    44  KOTAK


3. Data Selection

Use following functions for data selection.

  • .loc[ ]: Select rows/columns based on labels.
  • .iloc[ ]: Select rows/columns based on positions (zero-based indexing).
  • Boolean filtering for conditional selection.
Examples:

print(df.loc[df['Age'] > 40])  # Select rows where 'Age' > 40

Output:

Name Age Bank 2 Tapan 42 ICICI 3 Amit 44 KOTAK

print(df.iloc[1:3])  # Select rows from index 1 (inclusive) to 3 (exclusive)

Output:

Name Age Bank 1 Sunit 38 HDFC 2 Tapan 42 ICICI


4. Data Manipulation


Following data manipulation operations can be done on data:
  • Add, remove, or rename columns.
  • Create new columns based on calculations.
  • Handle missing values.
Example:

To rename a given column, write following code:

# To rename "Age" column with "Years Old" in "df" variable
df.rename(columns={'Age': 'Years Old'}, inplace=True)
print(df)

Output:

Name Years Old Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK


5. Data Aggregation and Summarization

  • Calculate descriptive statistics (mean, median, standard deviation, etc.).
  • Group data and perform aggregate operations (sum, count, average).
Example:

print(df.describe())  

Output:

Years Old count 4.000000 mean 41.000000 std 2.581989 min 38.000000 25% 39.500000 50% 41.000000 75% 42.500000 max 44.000000

where, 25%, 50% and 75% are Percentile,  it means how many of the values are less than the given percentile.


No comments:

Post a Comment