Sunday 17 March 2024

Introduction to Python Pandas


Introduction to Python Pandas


Pandas is a fundamental library for data manipulation in Python! Here's a quick introduction with some examples to get you started:

What is pandas?


Pandas is a powerful open-source library for data analysis in Python. It provides high-performance, easy-to-use data structures and data manipulation tools for working with tabular data like spreadsheets and databases.

It offers a wide range of features to:
  • Import data from various sources like CSV files, Excel spreadsheets, databases, and more.
  • Clean and prepare data by handling missing values, duplicates, and other inconsistencies.
  • Explore and analyze data using descriptive statistics, grouping, filtering, and sorting operations.
  • Perform data transformations by creating new columns, modifying existing data, and reshaping the data structure.
  • Visualize data using integration with libraries like Matplotlib and Seaborn for creating informative charts and graphs.

Why use pandas?


Some of the key programming activity that you can perform using pandas are listed below:
  • Data Cleaning and Manipulation: Pandas offers functions to clean messy data, handle missing values, and reshape your data for analysis.
  • Data Analysis: You can perform various data analysis tasks like calculating statistics, grouping data, and aggregating results.
  • Data Visualization: Pandas integrates well with plotting libraries like Matplotlib and Seaborn for data visualization.

Core Data Structures:


  • Series: A one-dimensional array labeled with data (similar to a spreadsheet column).
  • DataFrame: A two-dimensional labeled data structure with rows and columns (like a spreadsheet).

Getting started with pandas:

  • Installation: If you don't have pandas installed, use pip:
pip install pandas  
  • Import pandas:
import pandas as pd

Example 1: Creating a Pandas Series

# Prepare data variable
data = [1, 2, 3, 4, 5]

# Creating a Series
my_series = pd.Series(data)
print(my_series)

Output:
0 1
1 2
2 3
3 4
4 5
dtype: int64

Example 2: Creating a Pandas DataFrame

# Prepare data in dictionaries format
data = {
    'Name': ['Kuntal', 'Sunit', 'Tapan', 'Amit'],
    'Age': [40, 38, 42, 44],
    'Bank': ['SBI', 'HDFC', 'ICICI', 'KOTAK']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

Output:
Name Age Bank 0 Kuntal 40 SBI 1 Sunit 38 HDFC 2 Tapan 42 ICICI 3 Amit 44 KOTAK

Recommended Article for you:

    No comments:

    Post a Comment