What are the best data wrangling tricks

Best practices when working with Power Query

  • 10 minutes to read

This article provides some tips and tricks to help you get the most out of your data wrangling in Power Query.

Choose the correct connector.

Power Query offers a large number of data connectors. These connectors range from data sources, e.g. b. txt, CSV and Excel files to databases such as Microsoft SQL Server and common SaaS services such as Microsoft Dynamics 365 and salesforce. If the data source is in the window Data is not displayed, you can always use the ODBC or OLEDB connector to connect to the data source.

If you use the best connector for the task, you will get the best performance and performance. If you z. b. Using the SQL Server Connector instead of the ODBC Connection Connector when connecting to a SQL Server database not only gives you a much better one Data Quantity, but also features that you can use to improve performance and performance, such as: b. SQL Server folding queries. For more information on query folding, see Power Query Query Folding.

Each data connector follows a standard behavior as explained in "Data Acquisition". This standardized representation offers a phase called " Data previewAt this stage, you get an easy-to-use window where you can choose the data you want to get from the data source, if the connector allows, and a simple data preview of that data. You can even have multiple datasets from the data source over the window navigator as shown in the following figure.

Filter early

It is always recommended that you filter your data in the early stages of the query or as early as possible. Some connectors take advantage of their filters by folding queries, as described in Power Query Folding Queries. It is also a good practice to filter out any data that is not relevant to your case. This helps you focus on your task by only showing data that is relevant in the "Data Preview" section.

You can use the Auto Filter menu, which shows a different list of the values ​​found in the column, to choose the values ​​that you want to keep or filter out. You can also use the search bar to find the values ​​in the column.

You can also use the type-specific filters such as b. in the previous one Use column for a Date, DateTime or even Date time zone column.

These type-specific filters can help you create a dynamic filter that always retrieves data that was in the previous one x -Number of seconds, minutes, hours, days, weeks, months, quarters, or years located as shown in the following figure.

Note

For more information about filtering data based on values ​​from a column, see Filtering on Values.

Use the correct data types

Some features in Power Query are contextual with the data type of the selected column. If you z. b. Selecting a date column, the options available are under the Columns group date and time in the menu Add column available. However, if the data type is not specified for the column, these options are grayed out.

A similar situation arises with the type-specific filters because they are specific to certain data types. If the correct data type is not defined for your column, these type-specific filters will not be available.

It is vital that you always work with the correct data types for your columns. When working with structured data sources, e.g. b. Databases, the data type information is derived from the table schema found in the database. In the case of unstructured data sources, e.g. b. txt and CSV files, however, it is important that you set the correct data types for the columns that will come from this data source. By default, Power Query offers automatic data type detection for unstructured data sources. You can find more information about this function and how it supports the data types here.

Note

For more information about the importance of data types and how to use them, see Data Types.

Examine your data

Before you begin preparing your data and adding new transformation steps, it is recommended that you enable the Power Query data profiling tools so that you can easily discover information about your data.

These data profiling tools will help you better understand your data. The tools provide you with small visualizations that show you information on a column basis, e.g. b .:

  • Column quality - Provides a small bar chart and three indicators showing how many values ​​in the column are under the categories of valid, error value or empty values.
  • Column distribution - Provides a set of visuals below the names of the columns that show the frequency and distribution of the values ​​in each column.
  • Column profile - Provides a more detailed view of the column and its associated statistics.

You can also interact with these features to help you prepare your data.

Document your work

It is recommended that you document your queries by renaming or adding a description to your steps, queries, or groups as needed.

If Power Query automatically creates a step name in the Applied Steps pane, you can also rename the steps or add a description to any description.

A modular approach

It is entirely possible to create a single query that contains all of the transformations and calculations you need. However, if the query contains a large number of steps, it is a good idea to break the query into multiple queries, with one query referencing the next. The goal of this approach is to simplify transformation phases and decouple them into smaller parts so that they are easier to understand.

For example, suppose you have a query that has the nine steps shown in the following figure.

You can put this query together in step lead with price tables split into two columns. This makes it easier to understand the steps that were applied to the Sales query before the merge. To perform this process, right click on the step together lead with the tables , and select the option extract previous out.

You will then be asked in a dialog box to give the new query a name. This effectively splits the query into two queries. A query contains all queries before the merge. The other query has a first step that references your new query and the rest of the steps that you did in the original query from the Merge table with prices have performed down.

You can also use the query reference as needed. However, it's a good idea to keep your queries at a level that doesn't seem too much at first glance with so many steps.

Note

For more information about query references, see Understanding the Queries Pane.

Create groups

A good way to organize your work is to take advantage of the use of groups in the Queries area.

The only purpose of groups is to help you organize your work by acting as a folder for your queries. You can create groups within groups that you ever need. Moving queries across groups is as easy as drag and drop.

Try to give your groups a meaningful name that makes sense to you and your case.

Note

For more information on all of the available features and components found in the Queries pane, see Understanding the Queries pane.

Queries with future coverage

Making sure you are building a query that will have no issues during a future upgrade is your top priority. There are several features in Power Query to optimize and update the query for changes, even if some components of the data source change.

It is best practice to define the scope of the query, what to do, and what to consider in terms of structure, layout, column names, data types, and any other components that you think are relevant to the scope.

Some examples of transformations that can help you optimize the query for changes are as follows:

  • If the query has a dynamic number of rows of data, but a set number of rows to serve as a footer to be removed, you can use the " remove lower lines "use.

  • If the query has a dynamic number of columns but you only need to select certain columns from the dataset, you can use the function Select columns use.

  • If the query has a dynamic number of columns and you only need to unpivot a subset of the columns, you can use the " unpivot selected columns only "use.

    Note

    For more information about the options for unpivoting the columns, see UNPIVOT Columns.

  • If there is a step in the query that changes the data type of a column, but some cells produce errors because the values ​​are not the data type you want, you can remove the rows that returned error values.

    Note

    For more information about working and handling bugs, see Handling bugs.

Using parameters

Building dynamic and flexible queries is a best practice. Parameters in Power Query help you make your queries more dynamic and flexible. A parameter is a way to easily store and manage a value that can be reused in many different ways. However, this is commonly used in two scenarios:

  • Step argument - You can use a parameter as an argument to multiple transformations controlled by the user interface.

  • Custom function argument - You can create a new function from a query and reference parameters as arguments of your custom function.

The main advantages of creating and using parameters are:

  • Centralized view of all parameters in the window " Manage parameters ".

  • Reusability of the parameter in several steps or queries.

  • Allows easy and simple creation of custom functions.

You can even use parameters in some of the data connector arguments. You can e.g. b. create a parameter for the server name when connecting to your SQL Server database. You can then use this parameter in the SQL Server Database dialog box.

If you change the server location, all you need to do is update the server name parameter and your queries will be updated.

Create reusable functions

If you find yourself in a situation where you need to apply the same set of transformations to different queries or values, you can create a Power Query custom function that can be reused as many times as needed. A Power Query user-defined function is a mapping of a set of input values ​​to a single output value and is built from native M functions and operators.

Suppose you have multiple queries or values ​​that need the same set of transformations. You could create a custom function that could later be called on the queries or values ​​of your choice. This custom feature will save you time and help you manage your transformations in one central place that you can change at any time.

Power Query custom functions can be created from existing queries and parameters. For example, imagine a query with multiple codes as a text string and you want to create a function that decodes those values.

You start with a parameter that has a value that acts as an example.

You use this parameter to create a new query to which you apply the required transformations. In this case, you want the code Pty-CM1090-Lax split into several components:

  • origin = Pty
  • aim = Lax
  • airline = cm
  • Flightid = 1090

You can then convert that query into a function by right-clicking the query and CREATE FUNCTIONchoose. Finally, you can call the custom function in one of your queries or values, as shown in the following figure.

After a few more transformations, you can see that you got the output you wanted and used the logic for such a transformation from a user-defined function.

Note

For more information about creating and using custom functions in Power Query, see the Custom Functions article.