Custom Lists Generator in Power Query and Power BI

List.Generate is the powerful unction of M language (the language of Power Query aka “Get & Transform” for Excel and Power BI query editor), used for lists generation using custom rules. Unlike in other list generators (like List.Repeat  or List.Dates), the algorythm (and rules) of creation of successive element could be virtually any. This allows to use List.Generate to implement relatively complex get & transform tasks.

Although there are few excellent posts about this function uses (for example, Chris Webb, Gil Raviv, PowerPivotPro, KenR), I always I always lacked a more “clear” description — «How it actually works?» or «Why don’t it work?» and, at last, «What did the developers kept in mind when create this function?»

As usual, MSDN help article is laconical:

Generates a list of values given four functions that generate the initial value initial , test against a condition condition , and if successful select the result and generate the next value next . An optional parameter, selector , may also be specified.

Will you receive a list of four elements? Do you want to use an optional selector? Really? Why not?

In abandoned Power Query Formula Reference (August 2015) we can find the more clear description:

Generates a list from a value function, a condition function, a next function, and an optional transformation function on the values.

At least it is obvious that this function takes 4 arguments, all of type function:

Actually List.Generate uses quite simple loop algorythm. When creating an element of a new list, List.Generate evaluates a some variable (lets call it CurrentValue), which then passed from one argument-function to another in a loop:

  1. Start value CurrentValue  is the result of initial  function evaluation.
  2. Pass CurrentValue to condition function, check the condition and return true or false.
  3. Ifcondition = false then stop list generation.
  4. Ifcondition = true then create next element of the list with this rule:
    • If selector is passed to List.Generate and not is null, then pass CurrentValue to selector and evaluate its result.
    • Else (no selector at all or it is null) then the next element is equal to CurrentValue.
  5. Evaluate next function with CurrentValue argument, and assign it’s result to the CurrentValue, so the new CurrentValue is evaluated next(CurrentValue).
  6. Loop to Step 2.

As you can see from this not-so-technical description, the important difference of List.Generate from other iterator functions of M language is that almost all of others working in “For Each…Next” style (they have a fixed list to loop over), while List.Generate uses other logic – “Do While…Loop”, checking the condition before loop iteration. Subsequently, the number of elements in created list is limited only with “While” condition.

If we’ll write down the algorithm described above in other, non-functional language (like Visual Basic), it will look like that:

Please note:

  1. initial funciton has no arguments and its evaluated value is equal to its excression value.
    Even when you try to write the initial  function with arguments you cannot pass any argument to it, because it called somewhere inside of List.Generate.
    To be honest I do not understand why initial  IS a function but not a simple expression or value. May be there are reasons for it.
  2. initial function evaluated first
    But, if the first call of condition function will return false, a list will not be created despite the initial function was evaluated.
    In case of condition result is true then evaluated initial (or evaluated selector) will be the first element of the list. That’s why initial and next usually return same-structured values of the same type.
  3. condition, next  and selector  got evaluated CurrentValue as an argument, but they don’t have to use it. Actually these three functions clould ignore CurrentValue, and use some other logic behind.
    But, to be honest, I can’t imagine a situation when condition  (or next ) do not use CurrentValue, because it leads to endless loop or list won’t be created.
  4. selector evaluated despite of the result of next evaluation on the current loop iteration.
  5. next  always evaluated BEFORE the subsequent list element will be created (2nd and following).

When you create a list using some API calls (for example, you send GET or POST requests to API in initial and next functions), you should consider the following:

  1. API will be called at least once (when initial is evaluated).
  2. The number of API calls will always be at least one more than number of elements in created list (this excessive call is the result of the last next  function evaluation, which didn’t passe the condition)

It is convenient when both initial and next  return value of type record. This greatly simplifies the addition of counters and passing additional arguments for these functions (for example, one of record fileds is main data, second is counter, etc.).

Resuming, List.Generate is the powerfull tool, looking more complicated than in fact. Hope this post made it more friendly and comprehensible. 🙂


Power Query Ideas on Uservoice and Power BI Communities

Recently I posted about UsedRange trap when importing from Excel sheet in Power Query (you can read this post there).

To add an option for users who can meet the same problems as me I’ve added corresponding Power Query improvement ideas on excel.uservoice.com and https://ideas.powerbi.com

Please add your votes to these improvements – I’m sure they can not only save you a few rows of code but also can save you from potential pitfalls when working with raw Excel data.

Thank you!

 


Excel sheet as a source to Power Query and Power BI: a pitfall of UsedRange

When you import data from an Excel workbook to the Power Query or Power BI from entire sheet, be careful, there is a pitfall.

After linking to an external Excel file there are three options of data extracting available:

  1. From table (table-formatted range of cells in the sheet),
  2. From custom named range of cells,
  3. From entire sheet

In the first case, a Table object is already structured data with columns’ names, automatically transformed to PQ tables. In the second case, Power Query shall give the named range generic titles (Column1, Column2, etc.) and then work as before.

However, it is often the case that data are not structured in a formatted table or named range, and it can be difficult to transform them to such view before import. There can be many reasons for that, e.g., cells format is to be saved (merged cells are no longer merged after transformation) or there are too many files to transform them manually.

Data on a “raw” sheet

Fortunately, Power Query can extract data from the whole sheet. To get data from unformatted sheet you do not need to perform any special actions: just connect to the file, find the needed sheet (it will have “Sheet” value in the [Kind] column) and get data by retrieving its content from the [Data] column:

Excel sheets available as data sources just as tables or named ranges

The question is: what data range will be retrieved in that case? There are 17 179 869 184 cells on an Excel sheet (16 384 columns and 1 048 576 rows). If Power Query try to get them all, there will be huge memory consumption and performance leak. However, we can ensure that usually number of imported rows and columns is about the same as the number of rows and columns with the data may be slightly bigger.

So how Power Query defines a data range on a sheet? The answer is out there if you familiar with VBA macros and have enough experience with an Excel object model (but I think you will not be glad with this answer).

Continue Reading


Dynamically select measures to be shown on a Power BI visual via slicer

A few days ago a client asked me if it is possible to dynamically change series displayed on Power BI chart. My first (instinctive) answer was “Yes, of course, you can use a slicer to select which series you want to show, just put desired column in a slicer visual”. But then he added details: he wanted to select a measure to display on a chart, not to filter a value from column. My second (instinctive) answer was “No, you can’t. You can only filter a column, and can’t place measures in a slicer”.

But after a little chat I started to wonder whether it is really impossible. If we put a measure in a “Value” well of chart fields, it will be shown as a series (for example, some [Total Amount] measure). What my client is actually wants? He want to choose some elements on the slicer and, if one element selected, to show a measure. If that element is unchecked, then don’t show a measure.

Actually, those slicer’s elements are unique values from some column. A slicer applies a filter to that column. Can we catch whether a column is filtered? Yes, of course, we can do it with DAX. And if some desired value is selected, we just need to show a measure as a series. As that measure is already in the “Value”  well of a chart, then, in other words, we just have to “do nothing”. So, we only need to somehow hide a measure if a desired slicer’s element didn’t selected.

Continue Reading


On lazy value evaluation order in Power Query and Power BI

‘M’ (a Power Query Formula language) is a very powerful and interesting, but sometimes it could be a little confusing because of its lazy behavior.

Yesterday a friend of mine asked me for help with custom API connector in Power Query / Power BI. This connector has to work as follows:

  1. Post report requests
  2. Wait for reports completion
  3. Download reports

On the step 2 my friend used Function.InvokeAfter() built-in Power Query function, which supposed to pause code for a few seconds. But his code didn’t worked as expected – It looks like there are no requests posted at all.

We’ve found a solution to his task, but to make a long story short, there is what I have found from this case.

Let us look at this short code:

As we can imagine from ‘M’ evaluation rules, to get a” we need to calculate b”  first. We can suppose that before evaluation of value a” there goes value b” evaluation. So, we’ll get b” (current time), then extract current time from the same current time and get zero.

No way. The result is about -2 seconds: -00:00:02.0183210

Why? It is not so obvious, but there is a very easy answer: ‘M’ evaluates a value from left to right. When ‘M’ check for the a expression, it calculate first part:

Then it found b” and evaluate it:

The result of “b” is the local datetime with 2 seconds delay, so it is two seconds later than “a”. Of course, a – b approximately equals to -2 seconds.

It is easy to check:

There I changed evaluation order in the a” expression, so now b” is evaluated first, then second part (  DateTimeZone.LocalNow() ) is evaluated. As this evaluation is very quick, we have no delay and got the same time as in b”. The new result is 0, zero.

So, what I have found here is that relatively complex expressions in ‘M’ evaluates from left to right.

There is another interesting result. Let us see the next code:

The result of this code should be a list with two values. What I expected from previous example? There should be something like {-2, 0}, like results of previous calculations.

No way. The result is {-2, 2}:

Why there is a different result?

The reason is the lazyness of ‘M’: when the first element of the list evaluated, then it works like in example above: evaluate “a” then evaluate “b” (plus 2 sec), extract “b” from “a”. But for the second element ‘M’ did not make any evaluations of “a” or “b”. They already evaluated, so, as “b” is bigger than “a” for 2 seconds, extraction gives me +2.

If I change the order of the list elements:

The result will be {0, 0}:

Now it is expected – values didn’t recalculated

Now I can easily explain why: when evaluating the first element, b-a, the “b” evaluated first, then “a” immediately evaluated and it equals to “b”, and we get a zero as the result of extraction. Already calculated “a” and “b” then swap their places and give us the same result.

It looks as a very interesting finding for me. I think I have to keep it in mind when I’ll try to implement some time-delayed evaluations in queries.


What does it mean to be crossfiltered?

The history of “reinventing the bicycle” using DAX

Defining evaluation context and context transition rules is the most important and confusing part of DAX. Sometimes when you think you’ve already managed it, DAX turns to you with other side, hook, uppercut – and you’re knocked down.

Last week one of my Facebook friends asked me to explain why his measures working this way and not that way. It was quite easy questions and there is no sense to place them here. But my friend is a very curious man, one question led to another, and suddenly I found that I can’t explain a very simple, on the first look, concept. The question was about filtering under context transition in calculated column (here you can imagine a very big grin on DAX’s face).

For this post I reworked and simplified data model.

There are two tables, named Managers  and Sales . Managers  has only two columns: [Manager]  and [Department] .

'Managers' table

‘Managers’ table

‘Sales’ has a little bit more columns (although it doesn’t matter here): [Order] , [Manager] , [Amount]  and [Order type] .

'Sales' table

‘Sales’ table

As you can see, they are linked one-to-many by [Manager]  column. Quite easy.

Data Model

Data Model

First of all my friend asked me, how context transition works. “Hey, it’s easy!” – I said, and Continue Reading


Quick Filter for Unique Values in Power Query and Power BI

There are many ways to get a value from parameters table and then pass it to query. One of this methods uses direct selection of unique parameter name. I think it worth a post.

Items selection: brief reminder

As I described in my first ever post “Absolute and Relative References in Power Query”, when we would like to refer to a single item in a list or a cell in a Power Query table, we can use Name{Argument}  syntax: TableName{Row}  or ListName{Element} . If Name  is a table or list, and Argument  is number, then it is simple: we asking for a row or element of such position.

The most commonly used syntax for single cell addressing in tables is

But it is often omitted that if Name  is a table, Argument  could be not a number but a record:

Quick filter for unique values

Value passed as Record  in this expression works like a filter for a matching field in the table. For example, [empl_name = John] .

How it works and what is in it for practical uses?

If

  • our table has a column named empl_name

and

  • row with value “John” in this column could be found,

and

  • this row is unique (i.e. there is only one row with value “John” in column empl_name , matching to the Record ),

then entire record (i.e. row) will be returned as the result of this expression. In other words, a unique matching record from the table.

But there is one important restriction: if there is no unique matching row in the table, an error is raised.

So, when Table{Record}  returns a desired result, it has type of record and we can reach a single item from this record by referring to desired field in it:

We can see this method in action when we build query to a table from Excel file. Power Query will create such string of code automatically:

How Power Query refers to a table in Excel workbook

How Power Query refers to a table in Excel workbook

Implications

For example, we would like to implement table for passing user-defined parameters to Power Query (I recommend this great post “Building a Parameter Table” by Ken Puls):

 

In this example in the first row we create a parameters table with columns param and value that is supposed to be a parameters table for use in other queries.

Hardcoded parameters table (just for sample)

Hardcoded parameters table (just for sample)

Then we got a parameter’s value by applying “record filter” and selecting desired field ( [value]):

Now we can quickly get “Name” parameter’ value from the table above

And “Department” parameter’ value from the table above. Any unique parameter.

What are the benefits in this approach?

  1. First of all, we do not need to remember a desired parameter row when coding (what if user swapped rows in parameters table?).
  2. Then, we do not need to filter parameters table each time to get a desired value.
  3. And also we can check for parameters table integrity – if there will be several rows with the same parameter name, or missing parameter value, or missing field, then we’ll got an error and can handle it.

And, don’t forget that this Table{Record} method has a lot of other uses – when we really need to get a record as a result of expression. Also we can pass a more complex (with 2 or more fields) record as filter.

And Argument record could be a result from other queries. Lets name this query as “QueryRecord”:

 

We'll take a record from this table

We’ll take a record from this table

The result of above query is a record with two fields: [Name = "John", Department = "Sales"] .

I leaved only needed fileds in this record

I leaved only needed fileds in this record

Then we pass it as an filter argument to other table:

We will filter this table with the result of previous query

We will filter this table with the result of previous query

The result is a desired record:

Hey-ho, it works!!!

Hey-ho, it works!!!

Another one “Get parameter” function

And a cherry on the cake – quick parameter selection function. You can easy change it for your needs:

 

You can also find a complete description of item selection in “Microsoft Power Query for Excel Formula Language Specification” (see “6.4.1 Item Access” on page 59 in August 2015 edition).


SUMPRODUCT() and For Each loops in Power Query: Implementations of List.Accumulate. Part 2

In my mind List.Accumulate is one of most undocumented and most underestimated functions in Power Query  ‘M’ language. In this post I would like to continue explanation of this very interesting iterative function.

Data types

In my previous post about List.Accumulate function in Power Query ‘M’ language I’ve mentioned that seed value, passed to function, could be any. As it was described in Gil Raviv post,

The seed argument (or as we call it the initial state), should be of the same type as the first argument of your accumulator function and of the same type as of the function’s output.

Now I want to add more details to an explanation of the List.Accumulate arguments’ behavior:

  1. We can pass a seed value of any allowed in Power Query data type to function. What we can pass as a seed: table, list, number, date, record, null, duration… anything.
  2. Actually (and here I’d like to correct Gil) function’s output type is independent from the seed type.
  3. Function output type depends only from data type returned by accumulator subfunction. Real output type could be any:
  4. The first (state) argument in accumulator sub-function has to be the same type as seed value only at first iteration and only if state argument is used in accumulator (I really see no sense in manipulations without using state argument, but you can do it, that’s why I noted it). After the first iteration the accumulator result will totally depend on what this sub-function does (as shown above).
  5. The list elements could be used in accumulator sub-function. Could be not. You can use the list only as iteration counter – accumulator will perform its manipulations as many times as the count of list elements. This means that you can use List.Accumulate as For Each … Next” loop (which has as many steps as a count of elements in list).

Let us try to look at how those features could be implemented.

SUMPRODUCT () in Power Query

For a starter let’s try to convert the famous Excel function, SUMPRODUCT, to Power Query function.

Why we really need it?

There are a lot of ways to implement SUMPRODUCT in Power Query. For example, we can add custom column with formula like this: = [Amount] * [Price] and then just get the sum of it. It’s easy and clear, except we need to add this helper column to get just one number.

But let suppose we have got two lists by some query manipulations and these lists are not columns in any table. We can transform them to tables, and then somehow combine those tables, perhaps with Index column, and… looks not as easy as it was with columns in one table?

SUMPRODUCT itself

With List.Accumulate we can do it with just one row of code:

Obviously it equals 42.

How it works

  1. Here we got two lists of the same size, MyList1 and MyList2
  2. We’ll use one of lists as an list argument of List.Accumulate. We actually will use its elements in calculation, but also the count of list elements will be used as a counter to iterations inside our function.
  3. As the seed argument we used a record: [Sum=0, Index =0], where Sum will be used as field for calculation and Index will be used as list’s current position counter.
  4. We can describe the accumulator function as follows:
    1. For Each current element In MyList1
    2. Multiply current on MyList2{state[Index]} (where state[Index] is the value of Index from previous step or from seed if it is first step)
    3. Add the result to the state[Sum] (where state[Sum] is the previous/seed value of Sum)
    4. Increase previous/seed value of Index by 1
    5. Next
  5. At the end of the loop we have the result as a record of two fields: Sum and Index, as a result of Accumulate. We need only Sum field value, so we just add this filed name in square brackets at the end of row to get its value. Actually it means “give me the value of the [Sum] field from List.Accumulate result”.

SUMPRODUCT() UDF for PowerQuery/’M’

You can easily transform code from above to the one-row function for future uses:

Of course, you can enhance it with error handling and other feautures, as you want.

Enjoy! 🙂

(to be continued)


BIN2DEC in Power Query: Implementations of List.Accumulate. Part 1

After @datachant have posted on using List.Accumulate in Power Query, I was intrigued of practical implementations of this function.

As you can see from his post, List.Accumulate takes a list as an input parameter, and then, starting from initial seed value, invokes the same accumulator function on each element of list, consequently.

You just need to remember than List.Accumulate can use two parameters in accumulator function, state and current: state (i.e. current result of previous calculations) of accumulator  and a current element from given list. Final result will be a value that we got on the last iteration of accumulator function. Please note that seed value actually used as the first state value in accumulator.

The power of List.Accumulate is in iterations: we will execute accumulator function as many times as the number of elements in list. So, if we have an algorithm that needed repeating calculations on list members or as much times as list members count, List.Accumulate is the answer.

I found a small algorithm that shows a practical use and core concepts of List.Accumulate: conversion from binary number to decimal. In this algorithm bits of the binary number are used one by one, starting with the most significant (leftmost) bit. Beginning with the value 0, you need to repeatedly double the prior value and add the next bit to produce the next value.

So, if we would like to convert binary 11012 to decimal number 1310, we need to split binary number to the list of bits:

{1,1,0,1}

Then we take seed value of 0 and perform next operations: double the prior value and add the next bit, repeat while we have bits:

(seed, or first state) *2 + 1(current bit) = 1
(state) * 2 + 1(current bit) = 3
3(state) * 2 + 0(current bit) = 6
6(state) * 2 + 1(current bit) = 13

In this algorithm we can see all parameters for List.Accumulate: given list of bits from binary code, initial seed value of 0, state of accumulation transferred from the one step to the next step, and current element of a list (current bit).

To implement this algorithm in Power Query List.Accumulate we just need to get a list from given input, for example it is a text string: “101010”

The result could be nothing than 42, it’s quite universal 🙂

I hope that this example will help you to find more practical implementations of List.Accumulate. And if you remember that initial list could be used just as iterations counter, and a seed value could be any… to be continued.


Stacking non-nested groups of repeating columns in Power Query

There are plenty examples on how to unpivot nested tables (some brilliant ones can be found here and here). But what if a table we need to make flat has no nested headers for repeating groups of columns? It ought to be nested, but it doesn’t. And standard unpivoting methods then do not works then.

Stacking (or unpivoting) non-nested groups of columns is not a common task, but sometimes it appears. It could be due to bad constructed software report or hand-made user table in Excel or web-page scrapping. But in some case, we meet with this horror.

Once a client asked me to help with intelligence on his database. Of course, I agreed. But when I saw this “database” which he proudly presented to me, I was glad that the customer does not see my face and did not hear any comments passed my lips.

There was an Excel worksheet with more than 100 (one hundred) columns with more than 10000 rows, with all diversity of data in it. About 80 columns was in repeating groups, named “Date”, “Sum”, “Date”, “Sum” …. “Date”, “Sum” – without nesting or grouping.

For some reasons, when a VBA solution couldn’t be applied (especially if you work in Power BI and want to pull data in it without voodoo dances), the only way to flatten this table was using the Power Query for Excel 2010-2013 (or “Get Data” in Power BI, “Get & Transform” in Excel 2016).

Working on that task I discovered that there are at least four ways of doing it. You can choose the method that best suits your case.

As-is and to-be

What we have in source and what we need to get shown on the picture below:

Stacking Paired Columns

The source and the target: not so easy…

Actually we need to stack repeating groups one under other, keeping in place the common data from first columns.

Continue Reading