- 3 Minutes Wednesdays
- Posts
- Loading files with missing numbers
Loading files with missing numbers
Guten Tag! đź‘‹
Many greetings from Munich, Germany. When you read in a csv
-files, it often happens that the rows in the file denote missing values with notations like NA
, NULL
or even -9999
. In particular, the NA
and NULL
values will cause your columns to be loaded as a character
.
This means that while the numbers look like they could do what numbers do, they can’t. For example, even if you take only the first two rows with slice()
you still cannot do calculations with the numbers.
See? You will always get a error saying "non-numeric argument to binary operator"
. That’s a tell-tale sign that something isn’t encoded in the data format you expect.
But the good news is this: You can actually tell read_csv()
what constitutes a missing value. With the na
argument, you can specify a vector of strings that should be treated as NA
.
So with that you can do the calculations you want.
Many read_*()
functions from the {readr}
package have similar na
arguments. So the same trick works for these functions as well. But things get a tiny bit more complicated with other files that are not covered by {readr}
(like Excel files.) Let’s talk about this tomorrow.
This was just a tiny nugget from the many things I will teach you in the new 21 video lessons of the Data Cleaning Master Class. If you’re ready to clean data faster & efficiently to get to your data insights faster, sign up for the course today 👇️
And don’t forget: The 15%-off promo code “PART2RELEASE” is still available for 6 days.
Happy to have you onboard,
Albert
Reply