Popular Pandas Operations with Examples
If you want to use Pandas in your data science project, then learn followings common pandas operations:
- Creating Data
- Accessing Data
- Data Selection
- Data Manipulation
- Data Aggregation and Summarization
Let's explore these operations with examples:
1. Creating Data
Let's create Series and DataFrame type of variables to apply various Pandas Operations on it.
Creating a Series
import pandas as pd
# Prepare data variabledata = [1, 2, 3, 4, 5]
# Creating a Seriesmy_series = pd.Series(data)
# Print a Seriesprint(my_series)
import pandas as pd
# Prepare data variable
data = [1, 2, 3, 4, 5]
# Creating a Series
my_series = pd.Series(data)
# Print a Series
print(my_series)
Output:
0 11 22 33 44 5dtype: int64
Creating a DataFrame
# Create dictionaries for each column
data = {
'Name': ['Kuntal', 'Sunit', 'Tapan', 'Amit'],
'Age': [40, 38, 42, 44],
'Bank': ['SBI', 'HDFC', 'ICICI', 'KOTAK']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
Output:
Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK
2. Accessing Data
Popular way to access data are:
- .head() : To view the first few rows.
- .tail() : To view the last few rows.
- [ ] : Access columns or rows by labels or positions (similar to list indexing).
Examples:
# To view first 2 rows from DataFrame "df"
df.head(2)
Output:
Name Age Bank
0 Kuntal 40 SBI
1 Sunit 38 HDFC
# To print last 2 rows from DataFrame "df"
df.tail(2)
Output:
Name Age Bank
2 Tapan 42 ICICI
3 Amit 44 KOTAK
3. Data Selection
Use following functions for data selection.
- .loc[ ]: Select rows/columns based on labels.
- .iloc[ ]: Select rows/columns based on positions (zero-based indexing).
- Boolean filtering for conditional selection.
Examples:
print(df.loc[df['Age'] > 40]) # Select rows where 'Age' > 40
Output:
Name Age Bank
2 Tapan 42 ICICI
3 Amit 44 KOTAK
print(df.iloc[1:3]) # Select rows from index 1 (inclusive) to 3 (exclusive)
Output:
Name Age Bank
1 Sunit 38 HDFC
2 Tapan 42 ICICI
4. Data Manipulation
Following data manipulation operations can be done on data:
- Add, remove, or rename columns.
- Create new columns based on calculations.
- Handle missing values.
Example:
To rename a given column, write following code:
# To rename "Age" column with "Years Old" in "df" variable
df.rename(columns={'Age': 'Years Old'}, inplace=True)
print(df)
Output:
Name Years Old Bank
0 Kuntal 40 SBI
1 Sunit 38 HDFC
2 Tapan 42 ICICI
3 Amit 44 KOTAK
5. Data Aggregation and Summarization
- Calculate descriptive statistics (mean, median, standard deviation, etc.).
- Group data and perform aggregate operations (sum, count, average).
Example:
print(df.describe())
Output:
Years Old
count 4.000000
mean 41.000000
std 2.581989
min 38.000000
25% 39.500000
50% 41.000000
75% 42.500000
max 44.000000
where, 25%, 50% and 75% are Percentile, it means how many of the values are less than the given percentile.
provides many powerful operations for data manipulation and analysis in Python. Popular Pandas operations include creating DataFrames, selecting rows and columns, filtering data, sorting values, handling missing data, and performing aggregation. For example, selecting a column using df['Name'], filtering rows with conditions like df[df['Age'] > 25], and sorting data using df.sort_values() are commonly used operations. These functions help users organize and process datasets efficiently.
ReplyDelete