- 3 Minutes Wednesdays
- Posts
- 3MW (Plot any function with stat_function 📈)

# 3MW (Plot any function with stat_function 📈)

Guten Tag!

Many greetings from Ulm, Germany. I get lots of questions on how to use the `stat_function()`

layer in `ggplot()`

. Typical use cases of `stat_function()`

are drawing a line from a fitted model or comparing your data to some theoretical distribution like the Gaussian distribution.

So today, let me show you how `stat_function()`

works. But first, let me throw in my regular announcements.

# Best chart for comparisons?

People know bar plots. They’re everywhere. And when people want to compare values that should go into **two separate** bar plots, people are often like “*hmmmm, why don’t I all of this into one chart?* 🤔”

That’s how paired bar charts are born. And just like regular bar charts they’re everywhere. But they suck. They suck baaaaad. Instead, a dumbbell plot is often a better alternative. In my newest YT video, I show you how they work and how you can build them with ggplot.

# Course progress

This week, I created a lesson on how to make great dumbbell plots (even better than the one from my YT video) and slope charts.

And the first 15 beta testers have been selected this week. 🥳 Once their feedback is incorporated in the the first lessons, I will likely do a pre-sale (at a lower price). If you don’t want to miss out on the lower price, don’t forget to sign up for the course mailing list.

Now, let’s jump into this week’s issue. As always, you can find all of the code on GitHub.

**Plotting a single function**

Imagine that you have some points that lie one a curved path like this.

As you can see, this data basically uses the function `x^3`

and adds some random noise to this. If we fit a polynomial model we would hopefully get something like `x^3`

as our model formula back and could plot this into the chart with our data.

Here, we use the `stat_function()`

layer for this. It uses a `fun`

argument to specify the function that needs to be plotted. And the rest is mostly aesthetics that you use for the line chart.

**Compare two densities**

Okay, how about comparing two density charts? Let’s take our favorite Penguins data and group the species into “Gentoo” and “Others”.

Then, we can just look throw our grouped data into a ggplot that uses `geom_density()`

to compare the distribution of these two groups.

# Add theoretical densities

Cool. We can see that Gentoo penguins seem to be heavier than the other species. Now, what if we wanted to also compare these density plots to some theoretical distribution like the Gaussian distribution?

First, we need to estimate the mean and standard deviation from the data. These are the parameters that define the Gaussian distribution.

Now, for each of those rows we’d have to call the `dnorm()`

function to draw the densities of the Gaussian distribution. But in each case, we’d have to use different `mean`

and `sd`

arguments like so:

To do this in `stat_function()`

, we iterate over the rows of `params_grouped_penguins`

and in each iteration we use values from the `mean_body_mass_g`

and `sd_body_mass_g`

column inside of `dnorm()`

. Basically, this would look like this:

But how do we get the correct values from the columns `mean_body_mass_g`

and `sd_body_mass_g`

into the function? The `pmap()`

function makes all of this possible with a nifty trick.

**Create a list of layers with **`pmap()`

`pmap()`

will iterate over every row and inside of every iteration, you will have to pass the current row to a new list `l`

. Then, you can access the columns via the names in `l`

.

This way, you get a list of `stat_function()`

layers. And the cool thing is that you can just add any list of layers to a previous ggplot like so:

**What if this fails (like for facets)?**

Unfortunately, this same strategy cannot be used on a facetted plot like this.

The problem with `stat_function()`

is that it will plot **all three** lines in **each** facet. But that’s not what we want, right? We want one line in each window.

Thus, we have to go the long route. This means that first we compute the mean and standard deviation for each species like before.

But then we also have to compute the x and y coordinates of the lines ourselves. This means that for each species, we need an x- and y-vector. We can add that to our previous `params_penguins`

tibble with `mutate()`

and `pmap()`

.

And finally, we can unnest these cells by calling the `unnest()`

function.

And now that this tedious step is done, we can just throw in a new `geom_line()`

layer into our plot with `coords_penguins`

as data set.

Hope you’ve enjoyed this week’s newsletter. If you want to reach out to me, just reply to this mail or find me on LinkedIn.

See you next week,

Albert 👋

If you like my content, you may also enjoy these:

## Reply