Loading Excel files with missing numbers into R

Guten Tag! đź‘‹ 

Many greetings from Munich, Germany. Yesterday, I’ve shown you how easy it can be to turn weird notations like NA, NULL or -9999 into actual missing values when you read a csv-file. With Excel files, things work similar but a tiny bit different.

Let’s try to read an Excel file with the {openxlsx} package.

Notice that this output doesn’t show us the data formats. That’s because this is a data.frame instead of a tibble. For a nicer output, it’s convenient to make this into one.

So here, we can now see that things are once again treated like characters. Luckily for us, the read.xlsx() function has an argument called na.strings that works pretty much the same as the na argument from read_csv().

This did half of the work for us. The output now correctly turns NULL and -9999 into NA as well. But the column’s data format? It’s still character.

Thus, we have to manually convert this from character to double. Here, we can do that with parse_number().

For every data type (in particular datetimes) there are helpful functions like parse_number() to do the conversion for us. For dates & times, there’s actually quite a few due to the variety of that format. If you’re curious, you can level up your data cleaning skills and find out more in Part 2 of the Data Cleaning Master Class. 👇️

And don’t forget: The 15%-off promo code “PART2RELEASE” is still available for 5 days.

Happy to have you onboard,
Albert

Reply

or to participate.