Dyota's blog

Power Query: Regex-ish

I had a table like this. It was a formatted Excel file, and the way it came out in Power Query was like this. I needed to filter for just the rows which contain single projects, and not for the "total rows" (where the figures are added up for that section/region).

I start with this...

Column1 Number
1. Region $ 200
123456-12345 Project A $ 100
143436-12448 Project B $ 50
185436-13443 Project C $ 50
2. Region $ 300
143556-00345 Project D $ 270
293496-12848 Project E $ 30

And need to end in this...

Column1 Number
123456-12345 Project A $ 100
143436-12448 Project B $ 50
185436-13443 Project C $ 50
143556-00345 Project D $ 270
293496-12848 Project E $ 30

The solution was a makeshift Regex.

Table.SelectRows(
    previous, 
    each 
        let 
            // character codes for numbers 0 - 9
            numberCharacters = {48..57},
            
            string = Text.Trim([Column1]),
            
            characterList = List.Transform(
                // break up the string into a list of characters
                Text.ToList(string),

                // tag each number as a 1, and non-number as a 0
                each 
                    let 
                        thisCharacter = Character.ToNumber(_),
                        isNumber = List.Contains(numberCharacters, thisCharacter)
                    in
                        if isNumber then "1" else "0"
            ),

            // this is a string of 1s and 0s that marks which characters are numbers and which aren't
            signature = Text.Combine(characterList),

            // project numbers conform to this pattern
            projectNumberSignature = "111111011111"
        in
            // the first 12 characters must match the target signature
            Text.Start(signature, 12) = projectNumberSignature
)

#powerquery