Creating a Function to Describe Multiple Dataframes
=====================================================
In this article, we will discuss creating a function that can describe multiple dataframes. The function should take a list of dataframe names as input and return the description of each dataframe.
Background
The describe() method is a useful method in pandas that generates descriptive statistics for numeric columns of a DataFrame (2-dimensional labeled data structure with columns of potentially different types). It returns a summary of values, such as mean, standard deviation, min, max, 25%, and 75%.
The Problem
We will create a function that takes a list of dataframe names as input. We want this function to return the description of each dataframe in the list.
My Initial Attempt
In my initial attempt, I tried storing all the dataframe names as columns in a separate dataframe (x) and then passing this to the function. However, this approach does not work because it only shows the description of one dataframe.
def des(df):
columns = df.columns
for column in columns:
column=pd.read_csv('SKUs\\'+column+'.csv')
column['Date'] = pd.to_datetime(column['Date'].astype(str),dayfirst = True, format ='%d&m%y',infer_datetime_format=True)
column.dropna(inplace=True)
return(column.describe())
As we can see from the provided output, this function only shows the description of one dataframe. We need to find a better approach.
The Solution
The solution is to create a list of DataFrames and then concatenate them together using pd.concat(). This will allow us to pass multiple DataFrames to the function and get their descriptions.
def des(dataframes):
dfs = []
for df in dataframes:
df1=pd.read_csv('SKUs\\'+df+'.csv')
df1['Date'] = pd.to_datetime(df1['Date'].astype(str), format ='%d%m%y',infer_datetime_format=True)
df1.dropna(inplace=True)
dfs.append(df1.describe())
return pd.concat(dfs, axis=1, keys=df.columns)
Explanation
This function works as follows:
- We create an empty list
dfsto store the descriptions of the DataFrames. - We iterate over each DataFrame in the input list
dataframes. - For each DataFrame, we read the CSV file using
pd.read_csv(). We also convert the ‘Date’ column to datetime format and drop any rows with missing values usingdropna(). - We append the description of the current DataFrame to the
dfslist usingappend(). - Finally, we concatenate all the descriptions in
dfstogether usingpd.concat().
Example Use Case
Here’s an example of how you can use this function:
data = {'UGCAA':[],'FAPG1':[],'ACSO5':[],'LGHF2':[],'LGMP8':[],'GGAF1':[]}
df=pd.DataFrame(data)
des_df = des([col for col in df.columns if len(df[col]) > 0])
print(des_df)
In this example, we create a DataFrame df with six columns. We then call the des() function and pass a list of column names to it. The function returns the description of each non-empty column as a DataFrame.
Conclusion
Creating a function that describes multiple dataframes is a useful task in data analysis. By using the pd.concat() method, we can concatenate the descriptions of all DataFrames together into a single DataFrame. This allows us to easily compare and analyze the characteristics of different datasets.
In this article, we explored creating a function that takes a list of dataframe names as input and returns the description of each dataframe in the list. We also discussed common pitfalls when working with dataframes and provided an example use case for how to use this function.
By using this technique, you can easily analyze multiple datasets and make informed decisions based on your findings.
Last modified on 2024-12-02