Hierarchical table to flat table
What's all this then?
Say you have a table that looks like this. It represents hierarchical data, where one line has lines under it that are under its hierarchy (imagine a Pivot Table in Excel that's been expanded)
Level | Place |
---|---|
Country | Australia |
State/Province | Western Australia |
City | Perth |
City | Albany |
City | Broome |
City | Geraldton |
Country | Indonesia |
State/Province | Banten |
City | Cilegon |
City | Serang |
City | Tangerang |
City | Tangerang Selatan |
For lots of (good) reasons, you might need to convert it to something like this:
Country | State/Province | City |
---|---|---|
Australia | Western Australia | Perth |
Australia | Western Australia | Albany |
Australia | Western Australia | Broome |
Australia | Western Australia | Geraldton |
Indonesia | Banten | Cilegon |
Indonesia | Banten | Serang |
Indonesia | Banten | Tangerang |
Indonesia | Banten | Tangerang Selatan |
This is like "reversing" the pivot table process. We can do this with Power Query!
Strategy
The strategy is, for every level, isolate just that level, and the join onto the main table. For example, you'll isolate just the countries, and then join onto the main table.
The countries table will look like this:
Level | Place |
---|---|
Country | Australia |
Country | Indonesia |
Joining onto the main table, it will look like this:
Level | Place | Country |
---|---|---|
Country | Australia | Australia |
State/Province | Western Australia | |
City | Perth | |
City | Albany | |
City | Broome | |
City | Geraldton | |
Country | Indonesia | Indonesia |
State/Province | Banten | |
City | Cilegon | |
City | Serang | |
City | Tangerang | |
City | Tangerang Selatan |
If you Fill Down on the Country column, all of the rows under each country will be tagged with that country:
Level | Place | Country |
---|---|---|
Country | Australia | Australia |
State/Province | Western Australia | Australia |
City | Perth | Australia |
City | Albany | Australia |
City | Broome | Australia |
City | Geraldton | Australia |
Country | Indonesia | Indonesia |
State/Province | Banten | Indonesia |
City | Cilegon | Indonesia |
City | Serang | Indonesia |
City | Tangerang | Indonesia |
City | Tangerang Selatan | Indonesia |
The final step to clean up is to filter out the Country rows, leaving just State/Province and City rows:
Level | Place | Country |
---|---|---|
State/Province | Western Australia | Australia |
City | Perth | Australia |
City | Albany | Australia |
City | Broome | Australia |
City | Geraldton | Australia |
State/Province | Banten | Indonesia |
City | Cilegon | Indonesia |
City | Serang | Indonesia |
City | Tangerang | Indonesia |
City | Tangerang Selatan | Indonesia |
Now it's looking closer to the final table! Do this a few more times and it will be there.
This is the Power Query code to perform one iteration of the "flattening".
(source as table, hierarchyColumn as text, attributeName as text, joinColumns as list, attributeColumns as list ) =>
let
addIndex = Table.AddIndexColumn(source, "Index", 0, 1, Int16.Type),
base = Table.Buffer(addIndex),
attributeRowsOnly = Table.SelectRows(
base,
each Record.Field(_, hierarchyColumn) = attributeName
),
expandedAttributeColumns = List.Transform(attributeColumns, each Text.Combine({attributeName, _}, ".")),
join = Table.NestedJoin(
base,
joinColumns,
attributeRowsOnly,
joinColumns,
attributeName
),
expand = Table.ExpandTableColumn(
join,
attributeName,
attributeColumns,
expandedAttributeColumns
),
// Table.ExpandRecordColumn() will jumble up the order, so make sure to sort before filling down
sort = Table.Sort(expand, {"Index"}),
fillDown = Table.FillDown(sort,expandedAttributeColumns),
filterOutAttributeRows = Table.SelectRows(
fillDown,
each Record.Field(_, hierarchyColumn) <> attributeName
),
removeIndex = Table.RemoveColumns(filterOutAttributeRows, {"Index"})
in
removeIndex
unpivot(
base,
"Level",
"Country",
{"Level", "Place"},
{"Place"}
)