Reporting on SharePoint lists with many to many relationships using Power Query
SharePoint lists are often the repository for reportable items and whether these take the form of incident reports, inventory, inspection forms or other data we are often asked to surface that information in a usable fashion for the business. I ran across a scenario this week where a client had a basic list that they wanted to slice by counts occurring within certain time periods. Easy enough, except when I looked at the list the data structure was similar to the following demo set:
What we have here is a many to many relationship. Both the Building and TeamMembers fields are choice columns for this exercise and they support multiple values:
- Buildings: A-E
- Team Members: Alice, Eric, Joe, Mike, Nancy, Shannon, Taylor
The report goal is to identify, per calendar week, how many inspections of a particular building were conducted by a particular team member. If we mock that up quickly, we get a pivot with Buildings on X, Team Members on Y and a slicer based on our week. This is by no means the only way of delivering this data but it's the layout we're going to shoot for in this demo.
So let's see how we accomplish this in Power Query. If you haven’t downloaded this yet, go here and get the latest copy. So, opening a new Excel workbook, we go to the Power Query tab and use the From Other Sources drop down we get the option to query our SharePoint list.
Enter the name of the site that contains your target list (the site address, not the list itself) and when prompted for the type of authentication this data source requires, select the appropriate one. In this case I’m using Windows Authentication as this is an on premise SharePoint 2013 site with claims. In the navigation pane that will open up, double click on the appropriate list and you’ll be taken into the query editor window. Now we have some ETL to do before this is usable for our client. As you can see, the query loads every single field possible from the SharePoint list. With very few exceptions these are useless for our report. Click the header of the first column, scroll over and SHIFT+click the last column header. This should select everything.
Now, with all of those selected, let’s CTRL+click on the columns we want to keep, thereby deselecting them. We will need InspectionDate, ID, Building and TeamMembers. Once those are deselected, click the Remove Columns button.
There we are, looking a lot cleaner!
Ok, next, we want to expand the records so that this is something we can pivot on. Starting with Building and then with TeamMembers we click the button to expand the table. Make sure Expand is selected, not Aggregate.
As soon as we’ve done this we now see that each of our rows has been duplicated for as many unique combinations of Building and TeamMember that exist in that entry. We can use the ID column to see how many times a particular row (in our SharePoint list) has been repeated. This is the secret sauce step! Power Query has allowed us to transform several records with relationships that cannot be pivoted on into a flat list with unique data per row.
Let’s rename the columns so that they’re easier for the client to read.
Before we go on, note the Applied Steps area. This will be the topic of another blog post but effectively what we are doing in the query editor is creating a repeatable set of data transformation expressions in Power Query's language: "M". If we need more granular control we can actually edit the steps directly or we can simply use this to delete a step or visually refer to where we are in the process. Anyhow, we’re almost there. What we don’t have yet is the summary by week. So for the time being, let’s give this query a name and deselect saving to either our worksheet or to the data model. Once these are changed, hit Apply and Close in the top right hand corner.
On another sheet in our workbook I’ve created an ultra-simplified date dimension table. It only contains the relevant period for this demo and it’s not pulling from an external data connection. We can imagine, however, that in production we would pull from a corporate date dimension table in SQL.
Anyhow, let’s go back to the Power Query tab with our cursor still on the date dimension table. We’re going to create a new query using the content of this date dimension table. We do this by clicking the From Table option. We use From Table when our data is already housed within the workbook.
There’s nothing we need to do to this query right now. So let’s give it a new name then hit Apply & Close.
So now we have two queries that need to mash together. Let’s use some more Power Query magic by using the Merge option.
In our merge interface we are going to be asked to provide several things. The target table (Inspections), the table to merge in (Date Dimension) and then we need to select our matching column to merge on. In this case we select the appropriate date columns from each.
Again we are looking at unpacking a table to get the desired values but notice that now we have two entities within the array (each value in the Date Dimension table). Since we already have a date column in our table we don’t care to duplicate that data. Deselect the date and simply bring across the CalendarWeek value.
One thing to highlight while we’re here and it will be the subject of another blog post. If you click on the date column and go to Date/Time Filters you’ll see some new entries here that were added in the last two weeks with the June Power Query update. This is huge. Report on the last N periods is a regularly encountered request. We can now build logic right into our query to deliver only the results for the last three weeks, the last 2 months, whatever the client needs.
Anyhow, back to the demo. Rename the resulting column for our users, retitle the query and check the option to push this to our data model. Then Apply & Close.
At this point we can go into the Power Pivot tab, create a new pivot table and we end up with the result our client was looking for. Discussing Power Pivot or how we create the actual pivot table layout is beyond the scope of this post but remember in Excel 2013 you still need to enable the add-in to get access to this tab.
We won’t spend any time prettying it up since that’s not really the point of this blog post either. But we will test that adding new values to the SharePoint list. Let’s create a couple new entries.
If we then come back to our pivot table and refresh the data, we see that we’ve now got our new results included. A manager can now use this file without ever needing to know about the business logic hidden in the queries behind the scene. All the steps we just produced, creating both queries, cleaning the data, merging it and storing it in the model all occurs transparently without their involvement.
Power Query is extremely...powerful (sorry!) and will only get more so as Microsoft continues to add additional functionality. A caveat about the current demo is that you cannot upload this spreadsheet into SharePoint (either in Power BI or on premise) and have the scheduled data refresh update the data overnight. Microsoft has committed, though, to adding scheduled data refresh capabilities for Power Query and has already started on this with support for Power Query scheduled refresh from SQL and Oracle data sources and hopefully we will see support for SharePoint lists in the near future.
It’s Time To Transform
Let us show you how much easier your work life can be with Bonzai Intranet on your team.