We can
connect almost any data source in Power Query, but PowerPivot data model is not
included in that extensive list of sources out of the box.
But with
the help of the fabulous DAX Studio we can do it (although in my opinion it is
still inconvenient and tricky) at least locally – from the same workbook in Excel
or from Power BI Desktop.
All you
need is to open your Excel workbook, run DAX Studio add-in and connect it to
this workbook. Then you can just connect to the PowerPivot model as to SQL
Server Analysis Services cube.
But this is
an undocumented and extremely limited feature not supported by Microsoft, which
can only be used under your own risk.
In this post I describe how to implement the classical incremental refresh scenario for the cloud data sources in Power BI Service for Pro accounts. Step by step. It worth to read.
Foreword
As I wrote
in the previous post, we can implement a semi-incremental refresh for Pro
accounts in Power BI, using just dataflows.
The method
I described has a main lack: although the “historical” dataflow remains
unchanged and will never load data from the source again, the “refreshing”
dataflow will load the whole “fresh” part of data repetitively until you change
the date intervals manually.
Initially –
there’s small amount of data in the “fresh” part…
…but, after
some consequential refreshes, it could become significant, and not so fresh.
You can
again split “fresh” it in the two parts – “new historical” and “fresh”, and so
on. But this is only SEMI-incremental
refresh, and, of course, is not a good solution.
It seemed
that implement a complete, classic incremental update using just dataflows is
impossible.
But, after some investigations, I found a solution which helps to implement the classical incremental refresh scenario, where the fresh data part remains small and fresh, and historical part become updated without querying a data source.
Here I not only introduce Power BI dataflows and describe the semi-incremental refresh concept, but also show how it works in Power BI Service. Have a fun!
To my surprise, it has almost 1500 views in two days – not so bad 🙂 I understand that this happened because of hype topic, but, well… now I know how to remove some limitations and perform a CLASSICAL incremental refresh for some types of data sources. Blog post follows.
Incremental
refresh is a high-demand option in Power BI. Microsoft already provided it for
the Premium capacities, but for the Pro accounts it is still in waiting list.
However,
with introduction of dataflows in Power BI Service, an incremental refresh implementation
becomes available for Pro accounts too.
I won’t to
describe dataflows in detail here since there is a lot of blogs and resources
about it (but I’ll provide a few links in the bottom of the post).
Concept
All you
need to know now on how to implement an incremental refresh, is that
Dataflow in Power BI Service is a set
of web-based Power Query queries (named as ‘entities’).
Each dataflow could be refreshed manually
or by the its own schedule.
The result of evaluation of a
dataflow’s entities then stored in Azure Data Lake Storage Gen2 as tables (more
precisely as CSV files).
Then you can use dataflows (their entities)
as a data sources in your Power BI dataset.
Let’s start
from this point.
What is the
incremental refresh at all? In simple words, it means that in the single data
import action we are refreshing (updating) only the part of data instead of
loading all the data again and again. In the other words, we are dividing data in
two parts (partitions): first part does not need refresh and should remain
untouched, second part must be refreshed to bring in updates and corrections.
Let’s say you have a few numerical columns [A], [B] and [C] in your table and want to sum them to the new column in Power Query or Query Editor in Power BI.
In Power Query we have special buttons for this:
For example, we want to sum columns [A] and [C]. Just click (holding Ctrl button) column headers you want to sum, then go to “Add Column” – “Standard” – “Add”, and you’ll get a new column named “Addition” with the row-by-row sum of desired columns:
If we want to add three columns at a time, then we’ll also get a desired result:
But if in this table we want so sum columns [A] and [B], we are not expecting a pitfall, aren’t we?
The reason of this behaviour is simple and it reveals itself when we look at our data a little bit close: there is a null in column [B] in that row. In Power Query formula language (M) the expression null + value always returns a null (see this excellent post of Ben Gribaudo about null type and operations with null values).
But why we get a correct result when we sum up three columns? It is because Power Query uses different formulas when we sum two columns or three and more columns:
List.Sum function used in this case ignores null values and sums up only numerical values. Indeed, it gives more intuitive result, but on the contrary has not such intuitive syntax of simple addition.
I do not know what is the reason of such difference, and already complained to the development team. But if you rely on the buttons there, then you have to be aware of such behaviour.
What is the possible solutions there? It depends on what you want to get as a result, but in any case you should take a look at the formula bar and decide what to correct there:
If the logic of your calculations assume that value + null = null, then you should use simple + symbol between column names.
If you want to get value + null = value, then you should use List.Sum finction, like in that example: List.Sum({[A], [B], [C]})
THE SAME BEHAVIOR Power Query shows when you’ll try to multiply two columns and three or more columns: with two columns there will be the simple * symbol, with three or more columns there will be List.Product function used.
Ok, it is a really short post which I planned to (and ought to) write a long time ago…
Recently I needed to do the very simple thing in Power Query. I have the column of numbers and need to check if the values in this column are less than N and then put a corresponding text value in the new column. The function for the new column is something like this:
M
1
=if[Values]<5then"A"else"B"
Actually some values are not a numbers but nulls:
Data contains nulls and comparison return an error
And this simple calculation returns an error for these values!
Why? There is the catch, which is hidden in the depth of documentation (actually on the page 67 of the “Power Query Formula Language Specification (October 2016)” PDF which you can obtain there.
This is a very short post, just to make a reminder and possibly expand knowledge for me and my readers.
Sometimes, specially when working with calendar tables, we need to calculate ISO Week Number for certain date. There is no native functions in Power Query / M language / Power BI to get ISO Week number, so to obtain the desired result you need to write your own function.
Thanks to Catherine Monier, Microsoft Excel MVP, for providing the link to the “Date to ISO Week” M function already written for us. There also another function for converting ISO Week date (format input like: 2017-W02-7) to the normal date:
This function is not so hard to develop, I think, but it is always better when somebody gives you ready-to-use solution, isn’t it? 🙂Follow me:
List.Generate is the powerful unction of M language (the language of Power Query aka “Get & Transform” for Excel and Power BI query editor), used for lists generation using custom rules. Unlike in other list generators (like List.Repeat or List.Dates), the algorythm (and rules) of creation of successive element could be virtually any. This allows to use List.Generate to implement relatively complex get & transform tasks.
Although there are few excellent posts about this function uses (for example, Chris Webb, Gil Raviv, PowerPivotPro, KenR), I always I always lacked a more “clear” description — «How it actually works?» or «Why don’t it work?» and, at last, «What did the developers kept in mind when create this function?»
Generates a list of values given four functions that generate the initial value initial , test against a condition condition , and if successful select the result and generate the next value next . An optional parameter, selector , may also be specified.
Will you receive a list of four elements? Do you want to use an optional selector? Really? Why not?
In abandoned Power Query Formula Reference (August 2015) we can find the more clear description:
Generates a list from a value function, a condition function, a next function, and an optional transformation function on the values.
At least it is obvious that this function takes 4 arguments, all of type function:
Actually List.Generate uses quite simple loop algorythm. When creating an element of a new list, List.Generate evaluates a some variable (lets call it CurrentValue), which then passed from one argument-function to another in a loop:
Start value CurrentValue is the result of initial function evaluation.
Pass CurrentValue to condition function, check the condition and return true or false.
Ifcondition = false then stop list generation.
Ifcondition = true then create next element of the list with this rule:
If selector is passed to List.Generate and not is null, then pass CurrentValue to selector and evaluate its result.
Else (no selector at all or it is null) then the next element is equal to CurrentValue.
Evaluate next function with CurrentValue argument, and assign it’s result to the CurrentValue,so the new CurrentValue is evaluated next(CurrentValue).
Loop to Step 2.
As you can see from this not-so-technical description, the important difference of List.Generate from other iterator functions of M language is that almost all of others working in “For Each…Next” style (they have a fixed list to loop over), while List.Generate uses other logic – “Do While…Loop”, checking the condition before loop iteration. Subsequently, the number of elements in created list is limited only with “While” condition.
If we’ll write down the algorithm described above in other, non-functional language (like Visual Basic), it will look like that:
Visual Basic
1
2
3
4
5
6
7
8
9
10
11
12
DimNewList AsNewList
CurrentValue=initial()
DoWhilecondition(CurrentValue)
Ifselector=nullThen
NewList.Add(CurrentValue)
Else
NewList.Add(selector(CurrentValue))
EndIf
CurrentValue=next(CurrentValue)
Loop
Please note:
initial funciton has no arguments and its evaluated value is equal to its excression value.
Even when you try to write the initial function with arguments you cannot pass any argument to it, because it called somewhere inside of List.Generate.
To be honest I do not understand why initial IS a function but not a simple expression or value. May be there are reasons for it.
initial functionevaluated first
But, if the first call of condition function will return false, a list will not be created despite the initial function was evaluated.
In case of condition result is true then evaluated initial(or evaluated selector)will be the first element of the list. That’s why initial and next usually return same-structured values of the same type.
condition, next and selector got evaluated CurrentValueas an argument, but they don’t have to use it. Actually these three functions clould ignore CurrentValue, and use some other logic behind.
But, to be honest, I can’t imagine a situation when condition (or next ) do not use CurrentValue, because it leads to endless loop or list won’t be created.
selector evaluated despite of the result of next evaluation on the current loop iteration.
next always evaluated BEFORE the subsequent list element will be created (2nd and following).
When you create a list using some API calls (for example, you send GET or POST requests to API in initial and next functions), you should consider the following:
API will be called at least once (when initial is evaluated).
The number of API calls will always be at least one more than number of elements in created list (this excessive call is the result of the last next function evaluation, which didn’t passe the condition)
It is convenient when both initial and next return value of type record. This greatly simplifies the addition of counters and passing additional arguments for these functions (for example, one of record fileds is main data, second is counter, etc.).
Resuming, List.Generate is the powerfull tool, looking more complicated than in fact. Hope this post made it more friendly and comprehensible. 🙂
Please add your votes to these improvements – I’m sure they can not only save you a few rows of code but also can save you from potential pitfalls when working with raw Excel data.
When you import data from an Excel workbook to the Power Query or Power BI from entire sheet, be careful, there is a pitfall.
After linking to an external Excel file there are three options of data extracting available:
From table (table-formatted range of cells in the sheet),
From custom named range of cells,
From entire sheet
In the first case, a Table object is already structured data with columns’ names, automatically transformed to PQ tables. In the second case, Power Query shall give the named range generic titles (Column1, Column2, etc.) and then work as before.
However, it is often the case that data are not structured in a formatted table or named range, and it can be difficult to transform them to such view before import. There can be many reasons for that, e.g., cells format is to be saved (merged cells are no longer merged after transformation) or there are too many files to transform them manually.
Data on a “raw” sheet
Fortunately, Power Query can extract data from the whole sheet. To get data from unformatted sheet you do not need to perform any special actions: just connect to the file, find the needed sheet (it will have “Sheet” value in the [Kind] column) and get data by retrieving its content from the [Data] column:
Excel sheets available as data sources just as tables or named ranges
The question is: what data range will be retrieved in that case? There are 17 179 869 184 cells on an Excel sheet (16 384 columns and 1 048 576 rows). If Power Query try to get them all, there will be huge memory consumption and performance leak. However, we can ensure that usually number of imported rows and columns is about the same as the number of rows and columns with the data may be slightly bigger.
So how Power Query defines a data range on a sheet? The answer is out there if you familiar with VBA macros and have enough experience with an Excel object model (but I think you will not be glad with this answer).