3MW (Rainclouds with ggplot. A great way to visualize distributions)

Guten Tag!

Many greetings from Ulm, Germany. Like every week, let me share with you my newest video tutorial before we get started with this week’s newsletter.

By default, ggplot can draw a nice color bar legend for you. But that may introduce too many colors into your plot that make it hard to read the values from the chart. Instead, a discrete color bar version can be helpful. In this video I give you a step-by-step guide how to build such legends for your plots. As always, you can check out the video on YouTube.

As promised in the last newsletter, today, I will show you how to combine multiple visualizations that we have learned last week to create one very powerful visualization. This visualization is known as a raincloud plot and it looks like this.

Basically this chart is a combination of density chart, box plot and dot histogram. Alternatively, you could also show the exact data instead of a dot histogram.

Both of these combinations can give you a great deal of information of the distribution in the data. So let's start to build this. (All of today’s code can be found on GitHub)

Begin with box plots

First, we are going to create a box plot for each penguin species in our data set. In this step, we can make the width of the box plots narrower (since we know that we will need some room and don't have any use for a huge box).

Add the dots

Next, we can add the dot histogram with geom_dots() from {ggdist}. Conveniently, this function has a argument called side that we can set to "bottom" to make our dots stack downwards. Also, we can use position_nudge() to move all dots down a little bit so that they do not overlap with the box plots.

Add the density plots

Now, let us finish our chart by adding the density plot. Here, I am not going to use geom_density() or stat_density() but stat_slab() from {ggdist}. You can think of the latter function as more or less the same as stat_density() but it behaves in a more convenient way for our purposes here.

Avoid overlaps

We're almost done. But we need to do some fine-tuning. Right now, the dots and densitys overlap. That's not very nice. We can fix that by setting the height arguments in stat_slab() and geom_dots().

That's it. We've created our raincloud plot. Of course, you can now style this however you like. In the above images, I've just changed the colors and used colored labels in the title instead of using a y-axis and legend.

Modifications

Finally, let me show you the nifty trick that I used to create the "bar code" instead of using a dot histogram. Here's the code.

Basically, I have replaced geom_dots() by geom_point() and set it's shape to ¨|". The rest is just setting the color, using transparency and shifting the points a bit downward. In the end, this code gives you an image like this.

So that's how you create raincloud plots. If you want to read more about these, I recommend checking out Cédric Scherer's excellent blog post.

Hope you’ve enjoyed this week’s newsletter. If you want to reach out to me, just reply to this mail or find me on Twitter, uhhh I mean X.

See next week,
Albert 👋

If you like my content, you may also enjoy these:

Reply

or to participate.