Masterclass: Data Cleaning with R

Guten Tag! 👋 

Many greetings from Munich, Germany. Regular readers know that whenever I’m writing outside of my regular Wednesday 7pm (German time) schedule, something’s up. And the same thing is true this time 🥳 

I’m super excited to announce my new project “Data Cleaning with R Masterclass”. This new video course will teach you everything you need to know about data cleaning so that you can leave that messy step in your data analysis behind you quickly. That way, you can focus on the actual analysis.

The course will consists out of five parts, each of them focusing on one specific aspect or data type in your data cleaning efforts. And as is typical in my courses, all parts combine theory of how a function works with real-world examples of how to actually use the functions in practice.

And if this is something you’re interested in, sign up for the email list I’ve set up for this.

But if you need more information first, check out the tentative schedule for the five parts of this course 👇️ 

Extracting the right data from massive data sets & productivity hacks

In the first part of this course, students learn the basics of data wrangling like computing new columns and summarizing others. Also, this part will quickly dive deep into more advanced tricks and productivity hacks.

Mastering file formats and getting messy data into a good format

Data comes in all shapes and formats. Novice R users often struggle how to read a specific data set and then get it from a messy state into a usable format. That’s what we’ll learn in Part 2 of this course and in particular we will apply this to Excel files that every data scientist inevitably has to deal with 🫠 

Want to sign up for more course information,
promos and discounts?

Dealing with text data and messy labels

Working with text data is an often neglected part of data science training. This is unfortunate as it unlocks so many doors for you. You can

  • find specific entries of your data more quickly,

  • clean up labels before they go into your data viz or

  • rename a bunch of files in seconds.

All of these things will be covered in Part 3 of the course.

Mastering times & dates

Working with times and dates is always painful. There’s no way around that. It is simply a super complicated format. But to alleviate your pain, the lubridate package gives you a whole set of powerful functions that make working with times and dates bearable. But this package comes with a bit of a learning curve. So we’ll flatten that curve in Part 4 of the course.

Want to master text and time data? 👇️ 

Level up your workflows

I am a strong proponent of learning the functional programming (FP) paradigm. It’s not something that is strictly necessary to clean up data using the tools we’ve learned so far but it makes your workflow so much smoother. And the {purrr} package gives you all the tools you need to master FP. And since all of this relies heavily on working with lists, learning FP will also help you a lot when working with JSON data.

That’s it. That’s the curriculum. As you’ve seen, it is a jam-packed ambitious learning path and completing it will make you so much more efficient at data cleaning. So if you want to master data cleaning with R, stay informed and sign up for the email list I’ve set up for this here.

And as always, thank your support and I’ll see you tomorrow 👋 


Join the conversation

or to participate.