- 3 Minutes Wednesdays
- Posts
- Can we actually bring AI automations into production?
Can we actually bring AI automations into production?
Guten Tag! 👋
Many greetings from Munich, Germany. I see more and more data scientist being thrown into loosely data-related activities. The reason for that is (of course) AI. Ugh.
Yes, AI is the gift that keeps on giving. And if you're anything like me, you probably have a love and hate relationship with AI. It is such a super cool tool and I use it every day. But the hype around it is just maddening.
Aaaannnyway, the reason for why I'm writing about AI here is because data people are being thrown into the deep engineering end.
“It's related to data”, they said. “So it must be a good task for data scientists”, they said.
Well, over the last three years I’ve been building AI-powered automations. You know, the (by now) classical automation that
takes in a document,
extracts all the data that is desired and then
passes the structured data along to clients or some other part of the company.
That is a cool thing to work on. I love the automations that come out of this.
But time and time again, I realized that the tasks that come with building such an automation are not really what data scientists learn in their training.
But don’t fret. Let’s just change that.
Over the next couple of weeks I want to give you a high level overview of what it can entail to bring an AI document extraction into production.
Revisiting the AI basics
If you’ve been a long time reader, you know that long ago I published a little Quarto book called “R with AI”. In there, I explained how to use the {ellmer} package to use AI from R. And just like meeting the love of your life, it can all start with a simple chat:

Unlike the love of your life, AI keeps asks you for an API key. So the above code only works if you have an ANTHROPIC_API_KEY environment variable. With this chat, you can call one of its methods to talk to the AI. Like so:

So how about structured data extraction?
Luckily for us, ellmer also makes it very easy to extract data from the text we give it. All we have to do is use the chat_structured() method of the chat. This in combination with
some text to extract from and
a specification of what to extract
will get the job done. The latter is done by chaining/nesting many type_*() functions to establish elaborate structures to extract. Here, we’re just going to extract all the numbers we can find by combining type_number() with type_array()

Note that the description that you see inside type_*() functions are the prompts for the things you want to extract.
Lets do one more
By combining type_array() and type_object() we can let the AI extract data.frames. In this case, it allows us to avoid regular expressions:

Cool. So AI allows us to avoid writing regular expressions. Revolutionary stuff!
But in all serious, this is a pretty powerful mechanism. So it’s no wonder that I got to spent the last couple of years building automations around this stuff. More on that next time.
Best,
Albert
Whenever you’re ready, there are three I can help you:
Automate Your Data Reports: This course helps data analysts eliminate manual copy-paste reporting by automating PDF reports end-to-end, saving hours every cycle and preventing costly mistakes. (Using the lovely Typst language 😍)
Generate Insights in Minutes, not Hours: This comprehensive course teaches you to handle data faster, smarter, and more efficiently.
Bespoke Data Science Solutions: I’ve helped clients build their own data science solutions. Whether building custom web apps, PDF reports, AI automations or teaching workshops, I’ve got you covered. You can reach out to me via this form (or simply hit reply to this email)
Reply