3MW (Database Connections With R)

Enjoy 3 Minute Wednesdays at no cost.

This newsletter is brought to you for free. If you want to advance your R skills and support my work at the same time, then you will like my paid offerings:

Or use my affiliate code “RAPP10” to get 10% off the fantastic data science courses from Athlyticz:

Guten Tag!

Many greetings from Munich, Germany. When you’re working in a company, chances are you won’t be handed a simple CSV file to work with. Instead, you’ll likely need to grab your data from a company-wide SQL database hosted in the cloud or on internal servers.

So in today’s newsletter, I’ll show you how to connect to these databases using R.

What The Heck is a Driver?

First, we need a so-called driver. In simple terms, this is the tool that handles the connection between your computer and the database. Think of it as a translator that often comes in one of two flavors:

  • ODBC drivers: These drivers work with the open database connection format

  • JDBC drivers: These drivers use Java (that’s what the “J” stands for in JDBC)

Here, we’ll use the {odbc} package to go down the ODBC route.

Which Driver Do You Need?

So now that we know that we want ODBC drivers, we also need to decide which specific driver we need. You see, this kind of thing depends on the database your or your company uses. For example it could be an Oracle or PostgreSQL database that will need an Oracle or PostgreSQL driver

Typically, your tech department will tell you what you need. Your job is only to understand the big picture that I’m trying to explain here.

Example: Setting Up an SQLite Database

For today’s example, we’ll simulate a database using SQLite, which supports in-memory databases. This means that we can simply spin up a temporary database locally on our computer.

After having installed a SQLite driver for your computer, you can set up a connection from R to a temporary database using {DBI} and {odbc}:

Whooah. That’s a lot of function arguments. Once again, the connection details (like database name, user, password and port) is something that your tech department should have ready for you.

Creating and Populating a Table

So now that we have a database connection stored in a variable called con, we can create a table in our SQLite database. Here, we’ll use the famous Palmer Penguins dataset:

Querying Data from the Database

To fetch data, you’ll use dbGetQuery() from the {DBI} package. Here’s how to select all rows using SQL code:

Alternatively, you can (and probably should) use a bit more SQL to filter down the data that you actually want.

Why Use SQL for Filtering?

And you may wonder: “Why should I invest time into learning SQL when I can just grab the full data using the snippet you’ve just shown?“ And indeed, you could download the entire table and filter it inside of R. But filtering directly in the database is often much faster because:

  • SQL databases are optimized for these operations

  • You avoid transferring large datasets over the network

Learning a few basic SQL keywords can go a long way in improving your data workflow.

Nice! With that we covered:

  • The basics of database connections in R

  • The difference between ODBC and JDBC

  • How to set up and query a database using {DBI} and {odbc}

Armed with these skills, you’ll hopefully be ready to connect to real company databases and work with large-scale data. As always, if you have any questions, or just want to reach out, feel free to contact me by replying to this mail or finding me on LinkedIn or on Bluesky.

See you next week,
Albert 👋

Enjoyed this newsletter? Here are other ways I can help you:

Reply

or to participate.