3MW (Combine R & Python)

Guten Tag!

Many greetings from Ulm, Germany. Did you know that you can combine R & Python? Both languages are premier tools to work with data. And with a Quarto document, you can combine code chunks from both languages.

Even better: With {reticulate} you can pass data back and forth between those two languages. Let’s check out how that works in this week’s newsletter. But first, let me give you my usual announcements.

Make any ggplot interactive

You can make any ggplot interactive using the {ggiraph} package. For example, in a little project last week, I decided to recreate a Tableau dashboard using only Quarto, {ggplot2} and {ggiraph}.

In my newest Youtube video, I show you how {ggiraph} works so that you can create such easy dashboards yourself.

Course progress

I got very nice reviews on my upcoming data visualization course this week:

The course was super interesting and helpful and showed me how it really takes just a few steps to take a graph in R from good to great […]

[…] Wishing you the very best with the course. I would rate in top 10 percent of courses I took while learning dataviz

Pre-sale will launch later this week and people who are on the course mailing list, will get first access at a reduced price. If you want to join that list, you can still do that on the course page.

Now, let’s jump into this week’s issue. As always, you can find all of the code on GitHub.

Why two languages?

Using R & Python lets you use the best of both worlds. Not a big fan of tidymodels? No problem, just use the Python ML toolkits but you could still use R to clean your data first or to visualize your results later.

Or you could use Python to do web scraping if you don’t like {rvest} or {RSelenium} but still use R to clean up the messy web stuff that you scraped.

A ggplot first

To demo how {reticulate} works, let’s just create two versions of the same chart in both R and Python. The “preprocessing” of the data is very simple and handled within R.

From this, we could create a ggplot like we normally would. Here, this serves only the purpose of knowing how our Python plot should look later on.

Now, the cool thing is that there is a ggplot port to Python. It’s called plotnine. This helps us to focus on managing the interaction between Python & R rather than translating our plot code.

Get R data ready for Python

Now, the first thing we need to do is make sure that our data is in a data.frame format rather than in a tibble format. This helps Python understand the data set. After that, we can load {reticulate}.

Using Python

Now let’s check if we have access to penguins_data in Python. In a Python code chunk, we should be able to access variables from R with r.

If you try to execute this code chunk for the first time, {reticulate} may ask you if you want to set up a virtual environment. Apart from having Python installed on your computer, this may require additional tools to be installed.

But be careful. Additional things may still be required. In my case, I also needed to install python3-virtualenv. As always, setting up the Python environment can be a bit annoying. But once you get through that, you should see the following output after executing the code chunk.

Using plotnine

Now, to create the plot in Python we need to install the plotnine library for our {reticulate} environment. To do so, we call py_install() in an R code chunk.

Once that is handled, we can throw our ggplot code from before into a Python code chunk in which we import all the function that we need. Take a look and then I’ll explain the differences to our previous ggplot code.

As you can see, the code and the output are mostly the same. Cool! But there are small things you have to watch out for:

  • Multi-line code should be wrapped into paranthesis

  • shape = 21 needs to be removed in geom_point() because the “marks” in plotnine are encoded a little bit differently.

  • All aesthetics need to be in quotation marks. In Python, you cannot live without these.

  • In theme(), all . need to be replaced by _.

  • The Python style guide usually dictates that operators like + go to the front of the line and not the end. But it works either way since everything is enclosed in parentheses.

  • By default the background of the plot is transparent in plotnine. Thus, you will have to set the background to white manually.

Other than that, the code is pretty identical. Also, if you don’t want to import every single function that you may want to use, you can also import plotnine under a shorter name like pn and then precede all functions with pn.

From Python to R

Now, what if you wanted to access some variable from Python in R? Well, that’s just as easy. Just use py$ to access the variable name that you want. Here’s an example.

Hope you’ve enjoyed this week’s newsletter. If you want to reach out to me, just reply to this mail or find me on LinkedIn.

See you next week,
Albert 👋

If you like my content, you may also enjoy these:

Reply

or to participate.