Dyota's blog

Hierarchical table to flat table

What's all this then?

Say you have a table that looks like this. It represents hierarchical data, where one line has lines under it that are under its hierarchy (imagine a Pivot Table in Excel that's been expanded)

Level Place
Country Australia
    State/Province Western Australia
        City Perth
        City Albany
        City Broome
        City Geraldton
Country Indonesia
    State/Province Banten
        City Cilegon
        City Serang
        City Tangerang
        City Tangerang Selatan

For lots of (good) reasons, you might need to convert it to something like this:

Country State/Province City
Australia Western Australia Perth
Australia Western Australia Albany
Australia Western Australia Broome
Australia Western Australia Geraldton
Indonesia Banten Cilegon
Indonesia Banten Serang
Indonesia Banten Tangerang
Indonesia Banten Tangerang Selatan

This is like "reversing" the pivot table process. We can do this with Power Query!

Strategy

The strategy is, for every level, isolate just that level, and the join onto the main table. For example, you'll isolate just the countries, and then join onto the main table.

The countries table will look like this:

Level Place
Country Australia
Country Indonesia

Joining onto the main table, it will look like this:

Level Place Country
Country Australia Australia
    State/Province Western Australia
        City Perth
        City Albany
        City Broome
        City Geraldton
Country Indonesia Indonesia
    State/Province Banten
        City Cilegon
        City Serang
        City Tangerang
        City Tangerang Selatan

If you Fill Down on the Country column, all of the rows under each country will be tagged with that country:

Level Place Country
Country Australia Australia
    State/Province Western Australia Australia
        City Perth Australia
        City Albany Australia
        City Broome Australia
        City Geraldton Australia
Country Indonesia Indonesia
    State/Province Banten Indonesia
        City Cilegon Indonesia
        City Serang Indonesia
        City Tangerang Indonesia
        City Tangerang Selatan Indonesia

The final step to clean up is to filter out the Country rows, leaving just State/Province and City rows:

Level Place Country
    State/Province Western Australia Australia
        City Perth Australia
        City Albany Australia
        City Broome Australia
        City Geraldton Australia
    State/Province Banten Indonesia
        City Cilegon Indonesia
        City Serang Indonesia
        City Tangerang Indonesia
        City Tangerang Selatan Indonesia

Now it's looking closer to the final table! Do this a few more times and it will be there.

This is the Power Query code to perform one iteration of the "flattening".

(source as table,  hierarchyColumn as text, attributeName as text, joinColumns as list, attributeColumns as list ) => 
    let 
        addIndex = Table.AddIndexColumn(source, "Index", 0, 1, Int16.Type),
        
        base = Table.Buffer(addIndex),

        attributeRowsOnly = Table.SelectRows(
            base, 
            each Record.Field(_, hierarchyColumn) = attributeName
        ), 
        
        expandedAttributeColumns = List.Transform(attributeColumns, each Text.Combine({attributeName, _}, ".")),
        
        join = Table.NestedJoin(
            base, 
            joinColumns,
            attributeRowsOnly, 
            joinColumns,
            attributeName
        ),

        expand = Table.ExpandTableColumn(
            join, 
            attributeName, 
            attributeColumns, 
            expandedAttributeColumns
        ),

        // Table.ExpandRecordColumn() will jumble up the order, so make sure to sort before filling down
        sort = Table.Sort(expand, {"Index"}),
        
        fillDown = Table.FillDown(sort,expandedAttributeColumns),

        filterOutAttributeRows = Table.SelectRows(
            fillDown,
            each Record.Field(_, hierarchyColumn) <> attributeName
        ),

        removeIndex = Table.RemoveColumns(filterOutAttributeRows, {"Index"})
    in
        removeIndex
unpivot(
        base,
        "Level", 
        "Country", 
        {"Level", "Place"},
        {"Place"}
    )

#powerquery