Transforming Hierarchical Data with Level Columns in Python
Introduction
In this article, we will explore a way to transform hierarchical data represented as a list of dictionaries into a nested structure with level columns. The input data is a simple list of dictionaries where each dictionary represents a node in the hierarchy with its corresponding level and name.
We will use Python and provide solutions both without using external libraries (including pandas) and with them for completeness.
Understanding the Input Data
The input data has the following structure:
[
{
"level": 0,
"name": "python"
},
{
"level": 1,
"name": "food"
},
{
"level": 2,
"name": "banana"
},
{
"level": 3,
"name": "protein"
},
{
"level": 2,
"name": "apple"
},
{
"level": 1,
"name": "fuel"
}
]
Each dictionary contains two key-value pairs: level and name. The level represents the distance of the node from the root, and the name is the name of the node.
Our goal is to transform this data into a nested structure like this:
[
{
"level": 0,
"name": "python",
"children": [
{
"level": 1,
"name": "food",
"children": [
{
"level": 2,
"name": "banana"
},
{
"level": 2,
"name": "apple"
}
]
},
{
"level": 1,
"name": "fuel"
}
]
}
]
This structure represents a hierarchical tree where each node has a level and can have children of the same level.
Recursive Function Approach
One way to achieve this transformation is by using recursion. We will define a recursive function get_last_elt_at_lvl that takes two parameters: rec (the list of dictionaries) and lvl (the current level).
Code
import json
input_list = [
{
"level": 0,
"name": "python"
},
{
"level": 1,
"name": "food"
},
{
"level": 2,
"name": "banana"
},
{
"level": 3,
"name": "protein"
},
{
"level": 2,
"name": "apple"
},
{
"level": 1,
"name": "fuel"
}
]
def get_last_elt_at_lvl(rec, lvl):
if lvl == 0:
return rec[-1]
else:
for i in range(len(rec)-1,-1,-1):
if rec[i]['children']:
r = get_last_elt_at_lvl(rec[-1]['children'], lvl-1)
if r:
return r
return None
output_list = []
for d in input_list:
if d["level"] == 0:
output_list.append(d)
else:
last_elt = get_last_elt_at_lvl(output_list, d["level"]-1)
children = last_elt.setdefault('children', [])
children.append(d)
print(json.dumps(output_list, indent=4))
Explanation
The get_last_elt_at_lvl function works by iterating over the list of dictionaries from bottom to top (i.e., from leaf nodes to root nodes). If it finds a node that has children at the current level, it recursively calls itself on those children with the previous level. The base case for recursion is when the current level is 0; in this case, it returns the last dictionary in the list, which represents the root node.
The main part of the code uses this function to transform the input data into a nested structure. It iterates over each dictionary in the input list and checks if its level is greater than 0. If so, it calls get_last_elt_at_lvl with the current output list and the previous level minus one. The result is then used to create children for the corresponding node.
Using External Libraries (Pandas)
Another approach is to use external libraries like pandas to achieve this transformation. Pandas provides data structures and functions that are well-suited for hierarchical data manipulation.
Code
import pandas as pd
input_data = [
{"level": 0, "name": "python"},
{"level": 1, "name": "food"},
{"level": 2, "name": "banana"},
{"level": 3, "name": "protein"},
{"level": 2, "name": "apple"},
{"level": 1, "name": "fuel"}
]
# Create a DataFrame from the input data
df = pd.DataFrame(input_data)
# Set the level column as the index
df.set_index('level', inplace=True)
# Merge each row with its children
merged_df = df.merge(df, left_index=True, right_index=True, suffixes=('', '_child'))
# Select only the 'name' and 'children' columns
output_df = merged_df[['name', 'children']]
# Rename the 'children' column to 'sublevel'
output_df.columns = ['name', 'sublevel']
print(output_df)
Explanation
In this code, we first create a pandas DataFrame from the input data. We then set the level column as the index using the set_index method.
Next, we merge each row with its children by creating two DataFrames: one for the main rows and one for the child rows. The merge function is used to join these DataFrames based on their common column (index). The result is a new DataFrame where each row contains both the main data and the corresponding child data.
Finally, we select only the desired columns from the merged DataFrame and rename the children column to sublevel using the columns attribute.
Conclusion
In this article, we have demonstrated two ways to transform hierarchical data represented as a list of dictionaries into a nested structure with level columns. The recursive function approach uses recursion to achieve this transformation, while the external libraries (pandas) approach utilizes pandas’ data structures and functions for efficient data manipulation.
Both approaches have their own advantages and disadvantages, and the choice between them depends on the specific requirements of your project or use case.
Last modified on 2025-04-13