Joining Data Sets with R

Guten Tag! đź‘‹ 

Many greetings from Munich, Germany. Let’s be honest:

Data is always spread out all over the place and you rarely get one single file that has everything you need. Instead, you’ll have to take what you can find and make the nice all-in-one dataset yourself.

That’s why I want to share a little nugget from my video course about combining data sets today 👇️ 

JSON DERULO (sorry about that pun)

Sometimes you have to get data from an API using some internal or external web service. This is often the case when you want to use live data like stock prices or weather forecasts.

More often than not, you will receive a JSON-file from that. And if it’s not a file, it will still be in a JSON format. Here’s how that would look in R.

Combining JSON rows

At the end of the day, you can think of a JSON-file as just a bunch of nested lists. To get an overview of the structure, it helps to take a look at glimpse().

In this case, it’s just a list of lists of 2. And it seems like every inner list corresponds to one row. So, we can just combine the rows with bind_rows()

Adding columns as is

Now, assuming that you want to enrich your data, you want to get additional columns into your dataset. If you are certain that

  • the first row in json_data corresponds to the first row in a second data set and

  • you are sure the same is true for all the other rows,

then you can just stick the data sets side-by-side with bind_cols().

Joins are what you´ll need the most

But more often than not, you will have a second data sets where things are in a different order or there are simply much more extra information that you don’t need. For example, here’s a larger version of second_tib.

It doesn’t make much sense to stick second_tib next to our json_data because the amounts of rows do not even match. Instead, we only want to grab the rows that have the ids we are interested in. That’s where a join function like left_join() comes in.

And depending on the type of join you use, your output might look a slightly bit different.

That’s it for today. There’s lots more to say about JSONs and joins and all the other stuff I cover in my course. If you’re curious, you can level up your data cleaning skills and find out more in Part 2 of the Data Cleaning Master Class. 👇️

And don’t forget: The 15%-off promo code “PART2RELEASE” is still available for 4 days.

Happy to have you onboard,
Albert

Reply

or to participate.