LLMs + R

Generative AI for the UseR

Sharon Machlis at R-Ladies Paris [30 Nov 2025]

Slides: machlis.com/R-Ladies-Paris-2025
Repo: github.com/smach/RLadiesParis

3 Uses of LLMs With R Help you write R code Help you analyze data Add LLM functionality to your R code and apps

But First, a Warning! ⚠️

Always Keep in Mind When Using LLMS

They are predictive systems that work on probabilities! That means:

They don’t always give the same responses
They answer based on what’s probable to be correct
Even though they do things that are hard for people, they can fail at tasks that are easy for us

How do we deal with this?

Check their results and keep a human in the loop!

Use 1: Helping you write R code

Basic

Ask a chatbot like ChatGPT or Claude for help
Copy and paste (or type) its answers into your R script

PROS:

Included in your free account or subscription
You tend to learn more, easy to ask the AI to explain the code

CONS:

Slower workflow
Results can be bad without context
Giving it context can take time

How to Give Context?

btw R package

Easy way to copy & paste docs into a chatbot session

Example: R Packages/Functions You Want it to Know

library(btw)
library(ggplot2)
btw(?theme)

Example: Data You Want it to Know

my_data <- rio::import("data/liste_des_musees_franciliens.csv")
btw(my_data)

Use AI Inside Your IDE

PROS:

Much more convenient
Can read your files. Positron Assistant also can read R session objects!

CONS:

If you’re using a pay-for-use API, you need to watch usage
If you’re using a subscription, you may bump up against usage limits
Not a great experience (yet) in RStudio
Possible privacy concerns for sensitive work
May learn less if don’t look at the code and asking for explanations

Options in Positron

Positron Assistant - built into Positron and designed for data science
Claude Code (extension works in Positron, or use terminal version)
OpenAI Codex
GitHub Copilot
And others

Positron Assistant

Slide from Tom Mock at Posit’s R in Pharma talk

Tom’s Presentation

If you want to find out more:

AI in Positron

Watch the posit::conf(2025) presentation at

https://youtu.be/9ZW2tx5fHjk

As of now, the Positron Assistant only supports “Anthropic for chat and GitHub Copilot for inline code completions.” Anthropic API key needed and it’s pay per use. Can run up a bill ($4 in 30 minutes). Other providers are planned.

Other IDE Options

(Not necessarily optimized for R)

An App to Help You Write Code: ggbot2!

OpenAI API key required to run.

More on ggbot2

https://www.infoworld.com/article/4072500/how-to-run-an-r-data-visualization-chatbot-you-can-talk-to.html

Use 2: Helping You Analyze Data

Databot!

Joe Cheng explains Databot (video)

How to Install Databot - Positron only

https://positron.posit.co/databot.html

Setting > databot.researchPreviewAcknowledgment and type Acknowledged

Command Palette Ctrl-Shift-P > Open Databot

Use 3: Add AI Functionality to Your Code

Querychat: Ask Natural-Language Questions About Your Data

Include a description of columns in your data and some sample questions.
Ask natural language questions of your data.
Querychat translates that into SQL you can check and runs the SQL on your data

Posit demo of a dashboard with querychat: https://jcheng.shinyapps.io/sidebot/

Querychat Demo Code

For this all the following code using ellmer and ragnar, you’ll need an OpenAI API key to run it yourself

library(querychat)
my_data <- rio::import("data/UN_sample_data.csv")
my_dictionary <- "data/UN_sample_dictionary.md"
my_greeting <- "data/UN_sample_greeting.md"

querychat_config <- querychat_init(
  data_source = my_data,
  data_description = readLines(my_dictionary),
  greeting = readLines(my_greeting),
  create_chat_func = purrr::partial(ellmer::chat_openai, model = "gpt-4.1") )

RAG

Why RAG?

Maybe your data is too big to fit into an LLM context window.

Why RAG?

Maybe your data does fit, but that can get expensive! (And slow. And it’s unclear whether all LLMs can handle information well at their full context windows.)

What’s RAG?

Retrieval Augmented Generation

Retrieval: Retrieve text chunks that are most similar to a user’s query
Augmented: Send those chunks to an LLM as context along with the question
Generation – LLM uses that text to generate its response.

RAG Steps

Split text into smaller chunks
Add embeddings (lengthy numerical strings representing text’s semantic meaning) to each chunk
Save the chunks in a database
Retrieve chunks that are most similar to a query
Send query + chunks to an LLM to answer the question

Complete RAG How-To with the ragnar 📦

https://www.infoworld.com/article/4020484/generative-ai-rag-comes-to-the-r-tidyverse.html

Or my 2-hour Workshop for Ukraine video + code (for a €20 donation to help Ukraine)

Still More You Can Do!

ellmer is the main tidyverse package for connecting to LLMs in R

Extract structured data
Tool calling: You give the LLM “tools” (R functions), it decides when to use the tool, your system runs the R code and sends the results back to the LLM

Extracting Structured Data

library(ellmer)

# This is for extracting MULTIPLE of each from a text block
desired_data_structure_for_arrays <- type_array(
  type_object(
  date = type_string(
      paste("Date in yyyy-mm-dd format. If year isn't mentioned, date that makes the most sense relative to today's date of", Sys.Date())
  ),
  speaker_name = type_string(),
  description = type_string(
    "Brief description of the program"
  )
)
)

Extracting Structured Data (cont.)

my_text <- readr::read_file("data/RLadiesEvents.txt")
my_prompt <- "Extract the data for each workshop mentioned in the text below."
chat <- ellmer::chat_openai(model = "gpt-4.1", system_prompt = my_prompt)

extracted <- chat$chat_structured(
  my_text,
  type = desired_data_structure_for_arrays
)

Tool Calling

Why needed?

library(ellmer)
chat <- chat_openai(model = "gpt-4.1")
chat$chat("How many days are left in this year?")

Start with conventional R functions

days_left_in_year <- function(current_date = as.character(Sys.Date()) ) {
  current_date <- as.Date(current_date)
  first_day_of_next_year <- lubridate::ceiling_date(Sys.Date(), unit = "year") |>
    as.Date()
  days_left <- first_day_of_next_year - current_date
  return(as.integer(days_left))
}
days_left_in_year()

get_current_date <- function() {
  return(as.character(Sys.Date()))
}

Turn it into an ellmer tool

days_left_in_year_tool <- tool(
  days_left_in_year,
  name = "days_left_in_year",
  description = "Returns the number of days left in the current year based on a given starting date. Starting date defaults to today's date",
  arguments = list(
    current_date = type_string(
      "Date as a string in yyyy-mm-dd format",
      required = FALSE
    )
  )
)

get_current_date_tool <- tool(
  get_current_date,
  name = "get_current_date",
  description = "Gets the current date"
)

chat$register_tool(days_left_in_year_tool)
chat$register_tool(get_current_date_tool)
chat$chat("How many days are left in this year?")

Additional Resources: Hadley Wickham Video

Hadley Wickham’s No-Bullshit Guide to LLMs at UseR!

Additional Resources: Joe Cheng Video

Harnessing LLMs for Data Analysis, Joe Cheng, CTO at Posit

Workshop for Ukraine: Hadley Wickham

Hadley Wickham’s Using LLMs with ellmer (for a €20 donation to help Ukraine)

Additional Packages

vitals for evaluating LLM results. Video
mini007 for creating agents in R. Video
axolotr is an alternative to ellmer for using LLMs in R. Video
chores aims to help automate repetitive tasks using AI
side “highly experimental” coding agent for RStudio.

Additional Conference Videos

posit::conf(2025) You’ll need to search to find AI-related talks https://www.youtube.com/playlist?list=PL9HYL-VRX0oTixlfDPCS5RW_F1pccERRe

R+AI Conference Videos on YouTube
May only be available for a few weeks https://www.youtube.com/playlist?list=PL4IzsxWztPdkm4lcgBilHQjBoFFysP81e

Find Me:

Bluesky: @smachlis.bsky.social

Mastodon: @smach@masto.machlis.com

GitHub: @smach

LinkedIn: in/sharonmachlis

Website: machlis.com

Sharon Machlis