LLMs + R

Generative AI for the UseR

Sharon Machlis at R-Ladies Paris [30 Nov 2025]

Slides: machlis.com/R-Ladies-Paris-2025
Repo: github.com/smach/RLadiesParis

3 Uses of LLMs With R Help you write R code Help you analyze data Add LLM functionality to your R code and apps

But First, a Warning! ⚠️

Always Keep in Mind When Using LLMS

They are predictive systems that work on probabilities! That means:

  • They don’t always give the same responses
  • They answer based on what’s probable to be correct
  • Even though they do things that are hard for people, they can fail at tasks that are easy for us

How do we deal with this?

Check their results and keep a human in the loop!

Use 1: Helping you write R code

Basic

  • Ask a chatbot like ChatGPT or Claude for help
  • Copy and paste (or type) its answers into your R script

PROS:

  • Included in your free account or subscription
  • You tend to learn more, easy to ask the AI to explain the code

CONS:

  • Slower workflow
  • Results can be bad without context
  • Giving it context can take time

How to Give Context?

btw R package

Easy way to copy & paste docs into a chatbot session

Example: R Packages/Functions You Want it to Know

library(btw)
library(ggplot2)
btw(?theme)

Example: Data You Want it to Know

my_data <- rio::import("data/liste_des_musees_franciliens.csv")
btw(my_data)

Use AI Inside Your IDE

PROS:

  • Much more convenient
  • Can read your files. Positron Assistant also can read R session objects!

CONS:

  • If you’re using a pay-for-use API, you need to watch usage
  • If you’re using a subscription, you may bump up against usage limits
  • Not a great experience (yet) in RStudio
  • Possible privacy concerns for sensitive work
  • May learn less if don’t look at the code and asking for explanations

Options in Positron

  • Positron Assistant - built into Positron and designed for data science
  • Claude Code (extension works in Positron, or use terminal version)
  • OpenAI Codex
  • GitHub Copilot
  • And others

Positron Assistant

Slide from Tom Mock at Posit’s R in Pharma talk

Tom’s Presentation

If you want to find out more:

AI in Positron

Watch the posit::conf(2025) presentation at

https://youtu.be/9ZW2tx5fHjk

As of now, the Positron Assistant only supports “Anthropic for chat and GitHub Copilot for inline code completions.” Anthropic API key needed and it’s pay per use. Can run up a bill ($4 in 30 minutes). Other providers are planned.

Other IDE Options

(Not necessarily optimized for R)

An App to Help You Write Code: ggbot2!

OpenAI API key required to run.

More on ggbot2

https://www.infoworld.com/article/4072500/how-to-run-an-r-data-visualization-chatbot-you-can-talk-to.html

Use 2: Helping You Analyze Data

Databot!

Joe Cheng explains Databot (video)

How to Install Databot - Positron only

https://positron.posit.co/databot.html

Setting > databot.researchPreviewAcknowledgment and type Acknowledged

Command Palette Ctrl-Shift-P > Open Databot

Use 3: Add AI Functionality to Your Code

Querychat: Ask Natural-Language Questions About Your Data

  • Include a description of columns in your data and some sample questions.
  • Ask natural language questions of your data.
  • Querychat translates that into SQL you can check and runs the SQL on your data

Posit demo of a dashboard with querychat: https://jcheng.shinyapps.io/sidebot/

Querychat Demo Code

For this all the following code using ellmer and ragnar, you’ll need an OpenAI API key to run it yourself

library(querychat)
my_data <- rio::import("data/UN_sample_data.csv")
my_dictionary <- "data/UN_sample_dictionary.md"
my_greeting <- "data/UN_sample_greeting.md"

querychat_config <- querychat_init(
  data_source = my_data,
  data_description = readLines(my_dictionary),
  greeting = readLines(my_greeting),
  create_chat_func = purrr::partial(ellmer::chat_openai, model = "gpt-4.1") )

RAG

Why RAG?

Maybe your data is too big to fit into an LLM context window.

Why RAG?

Maybe your data does fit, but that can get expensive! (And slow. And it’s unclear whether all LLMs can handle information well at their full context windows.)

What’s RAG?

Retrieval Augmented Generation

  • Retrieval: Retrieve text chunks that are most similar to a user’s query

  • Augmented: Send those chunks to an LLM as context along with the question

  • Generation – LLM uses that text to generate its response.

RAG Steps

  • Split text into smaller chunks
  • Add embeddings (lengthy numerical strings representing text’s semantic meaning) to each chunk
  • Save the chunks in a database
  • Retrieve chunks that are most similar to a query
  • Send query + chunks to an LLM to answer the question

Complete RAG How-To with the ragnar 📦

https://www.infoworld.com/article/4020484/generative-ai-rag-comes-to-the-r-tidyverse.html

Or my 2-hour Workshop for Ukraine video + code (for a €20 donation to help Ukraine)

Still More You Can Do!

ellmer is the main tidyverse package for connecting to LLMs in R

  • Extract structured data
  • Tool calling: You give the LLM “tools” (R functions), it decides when to use the tool, your system runs the R code and sends the results back to the LLM

Extracting Structured Data

library(ellmer)

# This is for extracting MULTIPLE of each from a text block
desired_data_structure_for_arrays <- type_array(
  type_object(
  date = type_string(
      paste("Date in yyyy-mm-dd format. If year isn't mentioned, date that makes the most sense relative to today's date of", Sys.Date())
  ),
  speaker_name = type_string(),
  description = type_string(
    "Brief description of the program"
  )
)
)

Extracting Structured Data (cont.)

my_text <- readr::read_file("data/RLadiesEvents.txt")
my_prompt <- "Extract the data for each workshop mentioned in the text below."
chat <- ellmer::chat_openai(model = "gpt-4.1", system_prompt = my_prompt)

extracted <- chat$chat_structured(
  my_text,
  type = desired_data_structure_for_arrays
)

Tool Calling

Why needed?

library(ellmer)
chat <- chat_openai(model = "gpt-4.1")
chat$chat("How many days are left in this year?")

Start with conventional R functions

days_left_in_year <- function(current_date = as.character(Sys.Date()) ) {
  current_date <- as.Date(current_date)
  first_day_of_next_year <- lubridate::ceiling_date(Sys.Date(), unit = "year") |>
    as.Date()
  days_left <- first_day_of_next_year - current_date
  return(as.integer(days_left))
}
days_left_in_year()

get_current_date <- function() {
  return(as.character(Sys.Date()))
}

Turn it into an ellmer tool

days_left_in_year_tool <- tool(
  days_left_in_year,
  name = "days_left_in_year",
  description = "Returns the number of days left in the current year based on a given starting date. Starting date defaults to today's date",
  arguments = list(
    current_date = type_string(
      "Date as a string in yyyy-mm-dd format",
      required = FALSE
    )
  )
)

get_current_date_tool <- tool(
  get_current_date,
  name = "get_current_date",
  description = "Gets the current date"
)

Register the tools and re-run the chat

chat$register_tool(days_left_in_year_tool)
chat$register_tool(get_current_date_tool)
chat$chat("How many days are left in this year?")

Additional Resources: Hadley Wickham Video

Hadley Wickham’s No-Bullshit Guide to LLMs at UseR!

Additional Resources: Joe Cheng Video

Harnessing LLMs for Data Analysis, Joe Cheng, CTO at Posit

Workshop for Ukraine: Hadley Wickham

Hadley Wickham’s Using LLMs with ellmer (for a €20 donation to help Ukraine)

Additional Packages

  • vitals for evaluating LLM results. Video
  • mini007 for creating agents in R. Video
  • axolotr is an alternative to ellmer for using LLMs in R. Video
  • chores aims to help automate repetitive tasks using AI
  • side “highly experimental” coding agent for RStudio.

Additional Conference Videos

posit::conf(2025) You’ll need to search to find AI-related talks https://www.youtube.com/playlist?list=PL9HYL-VRX0oTixlfDPCS5RW_F1pccERRe

R+AI Conference Videos on YouTube
May only be available for a few weeks https://www.youtube.com/playlist?list=PL4IzsxWztPdkm4lcgBilHQjBoFFysP81e

Find Me:

Bluesky: @smachlis.bsky.social

Mastodon: @smach@masto.machlis.com

GitHub: @smach

LinkedIn: in/sharonmachlis

Website: machlis.com

Sharon Machlis