DataFrames and the Pandas library are revolutionizing the way we work with data in Python. Whether you’re cleaning raw data, performing complex transformations, or analyzing trends, DataFrames offer a versatile and efficient way to handle structured datasets.
At its core, a DataFrame is a two-dimensional, size-mutable, and labeled data structure—think of it as a table with rows and columns. It’s similar to an Excel spreadsheet or a relational database table but with the flexibility and power of Python at your fingertips.
The Pandas library, which provides the DataFrame structure, is a must-have tool for data analysts, data scientists, and anyone dealing with structured data. With its ability to handle diverse data types, perform powerful operations, and integrate seamlessly with various file formats (like CSV, Excel, SQL, and JSON), Pandas simplifies and accelerates data manipulation tasks.
DataFrames and Pandas are core components of Python for data analysis and manipulation.
Pandas is a powerful and flexible Python library that provides tools for working with structured data, such as tables or spreadsheets. It’s particularly well-suited for data cleaning, transformation, and exploratory data analysis.
A DataFrame is the primary data structure in Pandas. It is a two-dimensional, size-mutable, and labeled data structure, similar to a table in a relational database, an Excel spreadsheet, or a NumPy array with labeled rows and columns.
Here’s a quick guide to getting started with Pandas and DataFrames:
import pandas as pd
From a Dictionary:
data = {
‘Name’: [‘yuva’, ‘ganapati’, ‘Charlie’],
‘Age’: [25, 30, 35],
‘Salary’: [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Salary
0 Yuva 25 50000
1 Ganapati30 60000
2 Charlie 35 70000
Access Columns:
print(df[‘Name’]) # Access the ‘Name’ column
Filter Rows:
print(df[df[‘Age’] > 25])
Add a New Column:
df[‘Bonus’] = df[‘Salary’] * 0.1
print(df)
Summary Statistics
print(df.describe()) # Summary statistics of numeric columns
Read CSV File:
df = pd.read_csv(‘data.csv’)
Save DataFrame to CSV:
df.to_csv(‘output.csv’, index=False)
Indian Institute of Embedded Systems – IIES