Custom Lists Generator in Power Query and Power BI
List.Generate is the powerful unction of M language (the language of Power Query aka “Get & Transform” for Excel and Power BI query editor), used for lists generation using custom rules. Unlike in other list generators (like List.Repeat or List.Dates), the algorythm (and rules) of creation of successive element could be virtually any. This allows to use List.Generate to implement relatively complex get & transform tasks.
Although there are few excellent posts about this function uses (for example, Chris Webb, Gil Raviv, PowerPivotPro, KenR), I always I always lacked a more “clear” description — «How it actually works?» or «Why don’t it work?» and, at last, «What did the developers kept in mind when create this function?»
As usual, MSDN help article is laconical:
Generates a list of values given four functions that generate the initial value initial , test against a condition condition , and if successful select the result and generate the next value next . An optional parameter, selector , may also be specified.
Will you receive a list of four elements? Do you want to use an optional selector? Really? Why not?
In abandoned Power Query Formula Reference (August 2015) we can find the more clear description:
Generates a list from a value function, a condition function, a next function, and an optional transformation function on the values.
At least it is obvious that this function takes 4 arguments, all of type function:
List.Generate(initial as function, condition as function, next as function, optional selector as nullable function) as list
Actually List.Generate uses quite simple loop algorythm. When creating an element of a new list, List.Generate evaluates a some variable (lets call it CurrentValue), which then passed from one argument-function to another in a loop:
- Start value CurrentValue is the result of initial function evaluation.
- Pass CurrentValue to condition function, check the condition and return true or false.
- Ifcondition = false then stop list generation.
- Ifcondition = true then create next element of the list with this rule:
- If selector is passed to List.Generate and not is null, then pass CurrentValue to selector and evaluate its result.
- Else (no selector at all or it is null) then the next element is equal to CurrentValue.
- Evaluate next function with CurrentValue argument, and assign it’s result to the CurrentValue, so the new CurrentValue is evaluated next(CurrentValue).
- Loop to Step 2.
As you can see from this not-so-technical description, the important difference of List.Generate from other iterator functions of M language is that almost all of others working in “For Each…Next” style (they have a fixed list to loop over), while List.Generate uses other logic – “Do While…Loop”, checking the condition before loop iteration. Subsequently, the number of elements in created list is limited only with “While” condition.
If we’ll write down the algorithm described above in other, non-functional language (like Visual Basic), it will look like that:
Dim NewList As New List
CurrentValue = initial()
Do While condition(CurrentValue)
If selector = null Then
CurrentValue = next(CurrentValue)
- initial funciton has no arguments and its evaluated value is equal to its excression value.
Even when you try to write the initial function with arguments you cannot pass any argument to it, because it called somewhere inside of List.Generate.
To be honest I do not understand why initial IS a function but not a simple expression or value. May be there are reasons for it.
- initial function evaluated first
But, if the first call of condition function will return false, a list will not be created despite the initial function was evaluated.
In case of condition result is true then evaluated initial (or evaluated selector) will be the first element of the list. That’s why initial and next usually return same-structured values of the same type.
- condition, next and selector got evaluated CurrentValue as an argument, but they don’t have to use it. Actually these three functions clould ignore CurrentValue, and use some other logic behind.
But, to be honest, I can’t imagine a situation when condition (or next ) do not use CurrentValue, because it leads to endless loop or list won’t be created.
- selector evaluated despite of the result of next evaluation on the current loop iteration.
- next always evaluated BEFORE the subsequent list element will be created (2nd and following).
When you create a list using some API calls (for example, you send GET or POST requests to API in initial and next functions), you should consider the following:
- API will be called at least once (when initial is evaluated).
- The number of API calls will always be at least one more than number of elements in created list (this excessive call is the result of the last next function evaluation, which didn’t passe the condition)
It is convenient when both initial and next return value of type record. This greatly simplifies the addition of counters and passing additional arguments for these functions (for example, one of record fileds is main data, second is counter, etc.).
Resuming, List.Generate is the powerfull tool, looking more complicated than in fact. Hope this post made it more friendly and comprehensible. 🙂