Sunday, 17 March 2024

Common Pandas Operations with Examples


Popular Pandas Operations with Examples

If you want to use Pandas in your data science project, then learn followings common pandas operations:

  1. Creating Data
  2. Accessing Data
  3. Data Selection
  4. Data Manipulation
  5. Data Aggregation and Summarization
Let's explore these operations with examples:

1. Creating Data

Let's create Series and DataFrame type of variables to apply various Pandas Operations on it.  

Creating a Series

import pandas as pd

# Prepare data variable
data = [1, 2, 3, 4, 5]

# Creating a Series
my_series = pd.Series(data)

# Print a Series
print(my_series)

Output:

0 1
1 2
2 3
3 4
4 5
dtype: int64
  
Creating a DataFrame

# Create dictionaries for each column
data = {
    'Name': ['Kuntal', 'Sunit', 'Tapan', 'Amit'],
    'Age': [40, 38, 42, 44],
    'Bank': ['SBI', 'HDFC', 'ICICI', 'KOTAK']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

Output:

Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK


2. Accessing Data

Popular way to access data are:
  • .head() : To view the first few rows.
  • .tail() : To view the last few rows.
  • [ ] : Access columns or rows by labels or positions (similar to list indexing).
Examples:

# To view first 2 rows from DataFrame "df"

df.head(2)

Output:

Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC

# To print last 2 rows from DataFrame "df"

df.tail(2)

Output:

    Name    Age Bank
2   Tapan   42  ICICI
3   Amit    44  KOTAK


3. Data Selection

Use following functions for data selection.

  • .loc[ ]: Select rows/columns based on labels.
  • .iloc[ ]: Select rows/columns based on positions (zero-based indexing).
  • Boolean filtering for conditional selection.
Examples:

print(df.loc[df['Age'] > 40])  # Select rows where 'Age' > 40

Output:

Name Age Bank 2 Tapan 42 ICICI 3 Amit 44 KOTAK

print(df.iloc[1:3])  # Select rows from index 1 (inclusive) to 3 (exclusive)

Output:

Name Age Bank 1 Sunit 38 HDFC 2 Tapan 42 ICICI


4. Data Manipulation


Following data manipulation operations can be done on data:
  • Add, remove, or rename columns.
  • Create new columns based on calculations.
  • Handle missing values.
Example:

To rename a given column, write following code:

# To rename "Age" column with "Years Old" in "df" variable
df.rename(columns={'Age': 'Years Old'}, inplace=True)
print(df)

Output:

Name Years Old Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK


5. Data Aggregation and Summarization

  • Calculate descriptive statistics (mean, median, standard deviation, etc.).
  • Group data and perform aggregate operations (sum, count, average).
Example:

print(df.describe())  

Output:

Years Old count 4.000000 mean 41.000000 std 2.581989 min 38.000000 25% 39.500000 50% 41.000000 75% 42.500000 max 44.000000

where, 25%, 50% and 75% are Percentile,  it means how many of the values are less than the given percentile.


No comments:

Post a Comment