Working with Excel Files in Python Using Pandas
In this article, we will explore how to work with Excel files using the popular Python library pandas. We’ll delve into the details of reading and manipulating Excel data, focusing on a specific scenario where rows from one Excel file need to be moved to the end of another.
Introduction
Python is an excellent language for data analysis, thanks in part to its ability to interact seamlessly with various libraries and frameworks, including pandas. Pandas is built on top of NumPy and provides high-performance, easy-to-use data structures and functions for working with labeled datasets. In this article, we will use pandas to read Excel files and manipulate the data.
Reading Excel Files
Pandas can read multiple formats of Excel files, including .xls, .xlsx, and .xlsm. The read_excel() function is used to read an Excel file, taking several parameters that allow us to customize the reading process:
sheetname: The name of the sheet in the Excel file you want to read.parse_cols: A comma-separated list of column names to be parsed from the Excel file. In this case, we’re only interested in columns B through O.index_col: The column that contains the row indices for each value in the dataframe.header: The number of the row containing the column names (0-indexed).skiprows: The number of rows to skip at the beginning of the file.
Manipulating Data
Now, let’s examine how we can manipulate data within a pandas dataframe. For this example, suppose we have two Excel files with headers in Row 4 and data in Rows 5-15 for columns B:O, followed by another set of headers in Row 21 and data in Rows 22-30 for the same range of columns.
Our initial Python code attempts to capture both sets of data but includes the second header and its corresponding data. To achieve our desired outcome, we need to read the data from each file separately, then concatenate them while excluding the unwanted rows:
# Define the directory path where all Excel files are stored
path = r'C:\Users\sarah\Desktop\Original'
# Create an empty list to store the dataframes
list_ = []
# Iterate through each Excel file in the specified directory
for file_ in glob.glob(path + "/*.xls"):
# Read the top section of the file (headers and data from Rows 5-15, columns B:O)
top = pd.read_excel(file_, sheetname="Data1", parse_cols="B:O",
header=4, skip_rows=3, nrows=14)
# Read the bottom section of the file (headers and data from Rows 22-30, columns B:L)
bot = pd.read_excel(file_, sheetname="Data1", parse_cols="B:L",
header=21, skip_rows=20, nrows=14)
# Concatenate top and bottom sections while excluding unwanted rows
list_.append(top.join(bot, lsuffix='_t', rsuffix='_b'))
Explanation
Here’s a step-by-step breakdown of how the code works:
- Reading Data from Each File: We use
pd.read_excel()to read data from each Excel file. Thesheetnameparameter specifies which sheet in the file we want to read, whileparse_colsensures that only columns B through O are parsed. - Manipulating Row Counts: By using
nrows=14, we specify that we want rows 5-18 (i.e., Rows 5-15 for top section and Rows 19-30 for bottom section) from each file, excluding the header row. - Concatenating Dataframes: We use the
join()function to concatenate two dataframes while adding suffixes_tand_bto indicate which dataframe came first (top or bottom). - Creating the Final DataFrame: After appending all top and bottom sections to a list, we create the final dataframe by concatenating them using
pd.concat(list_).
Additional Considerations
While this solution meets our needs for a one-time operation with files having the same structure, it may not be suitable for more complex scenarios where Excel file formats vary significantly. If you’re dealing with different types of data sources or need to perform more frequent operations on your datasets, consider using alternative libraries like OpenPyXL or xlrd.
Conclusion
In this article, we’ve explored how to manipulate rows from one pandas dataframe and append it to another based on an Excel file structure. By leveraging the read_excel() function and customizing our reading parameters, you can successfully move data from multiple Excel files into a single dataframe for further analysis or processing.
Last modified on 2025-01-20