Popular Pandas Operations with Examples
If you want to use Pandas in your data science project, then learn followings common pandas operations:
- Creating Data
- Accessing Data
- Data Selection
- Data Manipulation
- Data Aggregation and Summarization
Let's explore these operations with examples:
1. Creating Data
Let's create Series and DataFrame type of variables to apply various Pandas Operations on it.
Creating a Series
import pandas as pd
# Prepare data variabledata = [1, 2, 3, 4, 5]
# Creating a Seriesmy_series = pd.Series(data)
# Print a Seriesprint(my_series)
import pandas as pd
# Prepare data variable
data = [1, 2, 3, 4, 5]
# Creating a Series
my_series = pd.Series(data)
# Print a Series
print(my_series)
Output:
0 11 22 33 44 5dtype: int64
Creating a DataFrame
# Create dictionaries for each column
data = {
'Name': ['Kuntal', 'Sunit', 'Tapan', 'Amit'],
'Age': [40, 38, 42, 44],
'Bank': ['SBI', 'HDFC', 'ICICI', 'KOTAK']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
Output:
Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK
2. Accessing Data
Popular way to access data are:
- .head() : To view the first few rows.
- .tail() : To view the last few rows.
- [ ] : Access columns or rows by labels or positions (similar to list indexing).
Examples:
# To view first 2 rows from DataFrame "df"
df.head(2)
Output:
Name Age Bank
0 Kuntal 40 SBI
1 Sunit 38 HDFC
# To print last 2 rows from DataFrame "df"
df.tail(2)
Output:
Name Age Bank
2 Tapan 42 ICICI
3 Amit 44 KOTAK
3. Data Selection
Use following functions for data selection.
- .loc[ ]: Select rows/columns based on labels.
- .iloc[ ]: Select rows/columns based on positions (zero-based indexing).
- Boolean filtering for conditional selection.
Examples:
print(df.loc[df['Age'] > 40]) # Select rows where 'Age' > 40
Output:
Name Age Bank
2 Tapan 42 ICICI
3 Amit 44 KOTAK
print(df.iloc[1:3]) # Select rows from index 1 (inclusive) to 3 (exclusive)
Output:
Name Age Bank
1 Sunit 38 HDFC
2 Tapan 42 ICICI
4. Data Manipulation
Following data manipulation operations can be done on data:
- Add, remove, or rename columns.
- Create new columns based on calculations.
- Handle missing values.
Example:
To rename a given column, write following code:
# To rename "Age" column with "Years Old" in "df" variable
df.rename(columns={'Age': 'Years Old'}, inplace=True)
print(df)
Output:
Name Years Old Bank
0 Kuntal 40 SBI
1 Sunit 38 HDFC
2 Tapan 42 ICICI
3 Amit 44 KOTAK
5. Data Aggregation and Summarization
- Calculate descriptive statistics (mean, median, standard deviation, etc.).
- Group data and perform aggregate operations (sum, count, average).
Example:
print(df.describe())
Output:
Years Old
count 4.000000
mean 41.000000
std 2.581989
min 38.000000
25% 39.500000
50% 41.000000
75% 42.500000
max 44.000000
where, 25%, 50% and 75% are Percentile, it means how many of the values are less than the given percentile.
No comments:
Post a Comment