Loading files with missing numbers

Guten Tag! đź‘‹ 

Many greetings from Munich, Germany. When you read in a csv-files, it often happens that the rows in the file denote missing values with notations like NA, NULL or even -9999. In particular, the NA and NULL values will cause your columns to be loaded as a character.

This means that while the numbers look like they could do what numbers do, they can’t. For example, even if you take only the first two rows with slice() you still cannot do calculations with the numbers.

See? You will always get a error saying "non-numeric argument to binary operator". That’s a tell-tale sign that something isn’t encoded in the data format you expect.

But the good news is this: You can actually tell read_csv() what constitutes a missing value. With the na argument, you can specify a vector of strings that should be treated as NA.

So with that you can do the calculations you want.

Many read_*() functions from the {readr} package have similar na arguments. So the same trick works for these functions as well. But things get a tiny bit more complicated with other files that are not covered by {readr} (like Excel files.) Let’s talk about this tomorrow.

This was just a tiny nugget from the many things I will teach you in the new 21 video lessons of the Data Cleaning Master Class. If you’re ready to clean data faster & efficiently to get to your data insights faster, sign up for the course today 👇️ 

And don’t forget: The 15%-off promo code “PART2RELEASE” is still available for 6 days.

Happy to have you onboard,
Albert

Reply

or to participate.