3MW (Reporting charts with {ggplot2})

Guten Tag!

Many greetings from Munich, Germany. Today, we’re kicking off our new series on using R for reporting. Let’s talk about creating charts with {ggplot2}. 🥳 

What can {ggplot2} do for me?

{ggplot2} is the premier package in R to create charts. Not only is it a powerful package on its own, it also comes with its own ecosystem of add-on packages. Can’t create a particular chart with {ggplot2}? Chances are phenomenal that there’s another gg-package that extends {ggplot2} to get the job done.

Mini Tutorial

To create a chart with ggplot, we load the tidyverse and then start out with a blank canvas using the ggplot() function.

A blank canvas

Next, we put geometric objects on top of that layer by layer. Let’s add points. This requires the geom_point() layer. Pretty descriptive name, right? For other things like lines or rectangles, there are geom_line() and geom_rect().

Well, nothing happened. How the hell should R now what you want to plot? That’s why we need to specify x- and y-aesthetics via the so-called mapping. Just create those with a vector using c().

Well, that turned out nicely. But what the heck is that aes() thingy? Let’s find out what it does by adding also a color aesthetic.

Ughh, that doesn’t look quite right. How about trying to specify colors outside of the aes()?

Oh nice. That worked. So what’s the deal with the different output? Let me explain:

  • Inside of the aes() you don’t specify things yourself. You say “ggplot, here’s my data. Pretty please, do a useful color thingy with that.”
    And ggplot is like “Uhh okay, I guess I can take the names that you gave me and make a color legend out of that. And when I do, I will assign a unique color to each of those names you gave me.”.
    Secretly, ggplot will also think to itself “Boy, I hope there are no real color names inside of the names. I don’t know how to handle this at this point.”

  • Outside of the aes() you specify things yourself. You say, “Listen up, ggplot. For each of the points, here are some colors. You better use them just like I specified them.” and then ggplot will just do what you say.

With that, we have already understood a fundamental concept of ggplot and we could replace our fake data points with actual data points from a data set. So, let’s get some data we’re probably already familiar with:

And then we can use it in ggplot.

Now, we could either change the appearance of these points based on data (inside of aes()) or based on a fixed value (outside of aes())

And finally to avoid having to duplicate the penguins$ part, we can use the data argument of any geom_*() layer.

More Resources

Nice! With that we have learned the basic principles of ggplot. If you want to learn more about {ggplot2} and the corresponding ecosystem, check out my extensive playlist on YouTube:

Or if you’re already an intermediate ggplot user, then my data visualization course will make you an expert in creating meaningful charts with R:

As always, if you have any questions, or just want to reach out, feel free to contact me by replying to this mail or finding me on Linkedin.

See you next week,
Albert 👋

Enjoyed this newsletter? Here are other ways I can help you:

Reply

or to participate.