[LAST CHANCE] Dealing With Excel's Pivot Tables

Guten Tag! đź‘‹ 

Many greetings from Munich, Germany. Today is the last day of my 15%-off sale. If you want to make data cleaning easier for yourself, then don’t miss out and apply the promo code “PART2RELEASE” at the course checkout page.

Now with that said, let’s dive into yet another freebie for those of you who are still on the fence.

Pivot tables are everywhere

Have you ever encountered a pivot table in Excel? You know, something like this:

This is a common way to show summary statistics by different attributes like regions, countries, genders or age. And even though this is a nice overview, it is terrible to untangle when someone provides you this kind of file for your data analysis in R.

Thankfully, there are two packages that make your life easier when you have to work with such pivot tables.

Meet {tidyxl}

{tidyxl} is a fantastic package when it comes to working with Excel files in R. But it’s a bit weird at first. You see, when you load the Excel file using {tidyxl}, then things don’t look anywhere close to what you have seen in the Excel file.

The thing is: {tidyxl} doesn’t import the data in the rectangular shape that you have seen when you’ve opened the file. Instead, it rearranges all cells by value type and coordinates. Here’s a neat animation from the docs that visualizes that:

Meet {unpivotr}

Now, you might be (rightfully) wondering: How does this cumbersome format help me? Well, in and of itself this new format is not super useful. But when you combine that with the {unpivotr} package, then things become pretty wild.

For starters, this package allows you to have a look at your data in a rectangular fashion.

But that’s not all. This package can also “behead” your rectangular table. See how the the countries are all the way on the left and there’s always a whole bunch of NA values below it? Well, with the left-up beheading mode, you can remove that whole column.

Notice how countries are gone in the rectangular view? Well, in the tidyxl-view, there’s a new column now.

And the magic is that after all rounds of beheading are over, your data is in a tidy format.

Don’t believe me? Let’s do another round then. This time, we use the ”left” mode because we always have exactly one region on the left.

Once again, after beheading, there’s another column in the tidyxl-view:

Choose the cleaned data by type

So if you fast-forward through the part of doing that with the upper labels as well (using the ”up-left” and “up” mode), then you can select all your new columns and the numeric column (because you want to get the numeric cells.) That way, you have all your data in a nice, tidy format:

 

Of course, there’s more that {tidyxl} and {unpivotr} can do for you. If you’re ready to discover their tricks and level up your data cleaning skills, I’m happy to have you onboard the Data Cleaning Master Class. 👇️

And remember: Last chance to use the 15% promo code “PART2RELEASE”

Happy to have you onboard,
Albert

Reply

or to participate.