Power Query Record as Dictionary: Part 1
#PowerQuery #ThinkTank #Record.Field #List.Accumulate #level300
With this publication I begin the series of posts about Power Query performance and challenges should be solved to gain the best results from the queries.
Initially these posts were written by Mikhail Muzykin for Telegram channel “For those from the tank” (https://t.me/pbi_pq_from_tank). Mikhail (also known as @buchlotnik) is an acknowledged Power Query master, slightly obsessed 😊 with the Power Query performance tweaks and the beauty of M language. That channel’s name has the second meaning, close to “I will explain this to you (again), in simple words”. Indeed, Mikhail’s simple and clear style absolutely reflects the name of the channel.
Here is the first post of the “Power Query Think Tank” series (I suppose this name perfectly describes the goal and style of the posts). It was slightly rewritten and enriched with pictures for the best reading experience.
Let’s start.
List.Accumulate + Record.Field vs Table.Join, or Power Query Dictionaries with Records
Let us have two tables: table “A” with two columns:
- Name
- Occupation
A = #table(
{"Name","Occupation"},
{
{"John","Player"},
{"Pete","Player"},
{"Nick","Coach"},
{"Aristotle","Medic"}
}
)
and table “B” with the next columns:
- Position
- Salary
- Bonus
B = #table(
{"Position","Salary","Bonus"},
{
{"Player",1000,100},
{"Coach",2000,5000},
{"Medic",100,1}
}
)
We want to get the table with the next set of columns:
- Name
- Occupation
- Salary
- Bonus
Of course, the first solution to get it is using Join:
let
A = #table({"Name","Occupation"}, {{"John","Player"}, {"Pete","Player"}, {"Nick","Coach"}, {"Aristotle","Medic"}}),
B = #table({"Position","Salary","Bonus"}, {{"Player",1000,100}, {"Coach",2000,5000}, {"Medic",100,1}}),
C = Table.Join(A, {"Occupation"}, B, {"Position"}),
D = Table.RemoveColumns(C, {"Position"})
in
D
But you always can say that we are cheating (a little) with Table.Join
here:
- columns, used as join keys, have different names while it often will be the same names
- there are always (almost) unnecessary columns in the right table, which must be deleted before join, etc.
Well, then we’ll use Table.NestedJoin
:
let
A = #table({"Name","Occupation"}, {{"John","Player"}, {"Pete","Player"}, {"Nick","Coach"}, {"Aristotle","Medic"}}),
B = #table({"Position","Salary","Bonus"}, {{"Player",1000,100}, {"Coach",2000,5000}, {"Medic",100,1}}),
C = Table.NestedJoin(A,{"Occupation"}, B,"Position", "tmp"),
D = Table.ExpandTableColumn(B, "tmp", {"Salary","Bonus"})
in
D
I think you can easily recognize this pattern.
So, what we are talking about there? We just want to provide a more performant alternative to join:
let A = #table({"Name","Occupation"}, {{"John","Player"}, {"Pete","Player"}, {"Nick","Coach"}, {"Aristotle","Medic"}}), B = #table({"Position","Salary","Bonus"}, {{"Player",1000,100}, {"Coach",2000,5000}, {"Medic",100,1}}), Dict = List.Accumulate( Table.ToRows(b), [], (s, c) => Record.AddField( s, c{0}, List.Skip(c) ) ), Generator = (X as list) as list => X & Record.Field(Dict, X{1}), E = Table.FromList( Table.ToRows(A), Generator, {"Name","Occupation","Salary","Bonus"} ) in E
What changed?
- A new strange step
Dict
withList.Accumilate
and an empty record as its second argument.
Here we:
- took a lookup table (
B
),- split it into list of rows (
Table.ToRows
), de facto – list of lists, where each inner list is the list of values from the table row),
- consequent added fields to the empty record: name of the field is the value from the first column (
c{0}
), value of the field is all other values from the row (List.Skip(c)
) in form of list.
- split it into list of rows (
In the other words, we have created a record, representing a dictionary (or collection), where names are lookup keys for corresponding values.
Purpose of this record is very simple: calling a field from this record by its name we’ll get this field value. This kind of lookup for values works very fast.
- The next step – named
Generator
– represents the function, which we’ll use for lookup on the next step.- Its only argument is the list
X
, and the function output is also a list.
- Function takes this list
X
and add some elements to it. Which ones? This is the values of lookup record (Dict
) from the field with the name taken from the second element ofX
:X{1}
. We take the second element because it is the second column of the table which holds the “Occupation” value for each row. If you still confused, more detailed explanation is just a few paragraphs below.
- Its only argument is the list
- At the end on the step
E
we pass functiond
toTable.FromList
– the fastest table constructor function.- First argument is our initial table
A
, transformed to the list of lists (rows values).
- First argument is our initial table
- Second argument – function
Generator
, and there’s the main magic happens. It takes row values as list (for example,{"John","Player"}
), lookup for the second element ("Player"
– do you remember thatX{1}
?) in the names of recordDict
fields, and adds value of this field"Player"
:{1000,100}
to the initial values of the row ({"John","Player"}
):
{"John","Player",1000,100}
- Third argument of
Table.FromList
sets column names for the resulting table
Looks unusual, isn’t it? But it could be really faster with the big tables. Try it!
Follow me:Share this: