ChatGPT in R: Everything You Need to Know

In this tutorial, you will learn how to run ChatGPT in R. We will discuss the OpenAI API which can be called and used to run ChatGPT within R. OpenAI has official documentation for Python and not for R but R users need not feel let down as this tutorial will now provide them with the necessary information.

What is ChatGPT?

Most of us are already familiar with ChatGPT so it might not need further introduction. ChatGPT is a smart chatbot that has knowledge in almost every field and provides responses like a human. It understands your query like a human and provides responses accordingly.

Table of Contents

Terminologies related to ChatGPT

It is important to understand some terminologies related to ChatGPT because it decides how much you pay and how you use ChatGPT.

Prompts

In simple words, prompt means a question or search query you want to ask to ChatGPT. Think like this - you have a very smart machine which can answer anything. You can ask it to write an essay, a programming code or anything else you can think of. But the machine requires specific instruction from you on what exactly you want them to do. Hence it is important that prompt should be clear and specific towards the response you wish

Tokens

Tokens are subwords or words. See some examples below

lower splits into two tokens: "low" and "er"
smartest splits into two tokens: "smart" and "est"
unhappy splits into two tokens: "un" and "happy"
transformer splits into three tokens: "trans", "form", "er"
bear is a single token

If you noticed words are split into tokens because they can have different suffix and prefix. "Low" can be lower or lowest so it is important to make the model understand that these words are related.

These tokens decide your usage and billing. OpenAI Team say you can estimate roughly that one token is about four letters for English text. But in reality this varies a lot.

Types of Tokens

In the previous section you understand what token means. Now it is essential to know the different types of tokens in the world of ChatGPT.

Prompt Tokens: Number of tokens used in your prompt (question)
Completion Tokens: Number of tokens used in writing response (answer/output)

Total Tokens = Prompt Tokens + Completion Tokens

Check out this link to understand the pricing structure for using the API.

Steps to run ChatGPT in R

Step 1 : Get API Key

You can sign up for an account on OpenAI's platform by visiting platform.openai.com. Once you’re there, you can create an account using your Google or Microsoft email address. After creating your account, the most important step is to get a secret API key to access the API. Once you have your API key, store it for future reference.

sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Step 2 : Install the Required Libraries

Before we can start using ChatGPT in R, we need to install the necessary libraries. The two libraries we will be using are httr and jsonlite. The "httr" library allows us to post our question and fetch response with OpenAI API, while the "jsonlite" library helps to convert R object to JSON format.

To install these libraries, you can use the following code in R:

install.packages("httr")
install.packages("jsonlite")

Integrate ChatGPT in R

In the code below, you need to provide two inputs in character vectors apiKey and prompt. First one refers to the OpenAI API Key you generated in the previous step. Second one refers to the question you want to ask to ChatGPT.

library(httr)
library(jsonlite)

apiKey <- "sk-xxxxxxxxxxxxxxxx"
prompt <- "R code to remove duplicates using dplyr. Do not write explanations on replies."

response <- POST(
  url = "https://api.openai.com/v1/chat/completions", 
  add_headers(Authorization = paste("Bearer", apiKey)),
  content_type_json(),
  encode = "json",
  body = list(
    model = "gpt-3.5-turbo",
    temperature = 1,
    messages = list(list(
      role = "user", 
      content = prompt
    ))
  )
)

content(response)

$id
[1] "chatcmpl-7DaAPWmVVc3f9VA5FKWTzeKMWSyii"

$object
[1] "chat.completion"

$created
[1] 1683471645

$model
[1] "gpt-3.5-turbo-0301"

$usage
$usage$prompt_tokens
[1] 25

$usage$completion_tokens
[1] 5

$usage$total_tokens
[1] 30


$choices
$choices[[1]]
$choices[[1]]$message
$choices[[1]]$message$role
[1] "assistant"

$choices[[1]]$message$content
[1] "df %>% distinct()"


$choices[[1]]$finish_reason
[1] "stop"

$choices[[1]]$index
[1] 0

Cleaned Output

Run the code below to generate output in more presentable manner.

cat(content(response)$choices[[1]]$message$content)

Output

df %>% distinct()

Since the output is a list, we can extract only the response. Using cat( ) function we can also take care of line breaks in the response. You may have observed the number of tokens API considers for question and generating response.

Important Points

GPT-4 : To use GPT-4, specify gpt-4o instead of gpt-3.5-turbo in the code above.
In OpenAI's API, the temperature argument is used to control the creativity or randomness of the generated text. It lies between 0 and 2. A higher temperature value will make the model more likely to generate more surprising and unexpected responses, whereas a lower temperature value will make the model more conservative and predictable. For example, if the temperature is set to 0.5, the generated text will be more focused, whereas if the temperature is set to 1.5, the generated text will be more random.

R Function for ChatGPT

Here we are creating user defined function in R for ChatGPT which is a robust method of calling ChatGPT in R. It wraps the R code shown in the previous section of this article in function and allows flexibility to user to change arguments of model easily.

chatGPT <- function(prompt, 
                    modelName = "gpt-3.5-turbo",
                    temperature = 1,
                    apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", apiKey)),
    content_type_json(),
    encode = "json",
    body = list(
      model = modelName,
      temperature = temperature,
      messages = list(list(
        role = "user", 
        content = prompt
      ))
    )
  )
  
  if(status_code(response)>200) {
    stop(content(response))
  }
  
  trimws(content(response)$choices[[1]]$message$content)
}

cat(chatGPT("square of 29"))

When you run the function above first time, it will ask you to enter your API Key. It will save the API Key in chatGPT_API_KEY environment variable so it won't ask for API Key when you run the function next time. Sys.setenv( ) is to store API Key whereas Sys.getenv( ) is to pull the stored API Key.

Sys.setenv(chatGPT_API_KEY = "APIKey") # Set API Key
Sys.getenv("chatGPT_API_KEY") # Get API Key

How to Customize ChatGPT in R

By setting the system role you can control the behavior of ChatGPT. It is useful to provide context to ChatGPT before starting the conversation. It can also be used to set the tone of the conversation.

For example if you want ChatGPT to be funny. To make these changes in R, you can add one more list in the messages portion of the code and the remaining code will remain as it is as shown in the previous section of the article.

In the code below, we are telling ChatGPT to act like a Chief Purchasing Officer of an automotive company. Students will ask domain specific questions related to the company/industry.

      messages = list(
        list(
          "role" = "system",
          "content" = "You are John Smith, the Chief Purchasing Officer of Surya Motors. Your company operates as per Toyota Production System. You are being interviewed by students"
        ),
        list(role = "user", content = "what are your roles and responsibilities?")
      )

R Function to Make ChatGPT Remember Prior Conversations

By default, OpenAI's API doesn't remember about previous questions in order to answer subsequent questions. This means that if you asked a question like "What is 2+2?" and then followed up with "What is the square of the previous answer?", it wouldn't be able to provide response as it does not recall previous prompt.

You must be wondering this functionality is already there in the ChatGPT website. Yes this functionality exists in the website but not with OpenAI API. To improve ChatGPT's ability to remember previous conversations, you can use the following R function.

chatGPT <- function(prompt, 
                    modelName = "gpt-3.5-turbo",
                    temperature = 1,
                    max_tokens = 2048,
                    top_p = 1,
                    apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  # Parameters
  params <- list(
    model = modelName,
    temperature = temperature,
    max_tokens = max_tokens,
    top_p = top_p
  )
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  # Add the new message to the chat session messages
  chatHistory <<- append(chatHistory, list(list(role = "user", content = prompt)))
  
  response <- POST(
      url = "https://api.openai.com/v1/chat/completions",
      add_headers("Authorization" = paste("Bearer", apiKey)),
      content_type_json(),
      body = toJSON(c(params, list(messages = chatHistory)), auto_unbox = TRUE)
    )
    
    if (response$status_code > 200) {
      stop(content(response))
    }
    
    response <- content(response)
    answer <- trimws(response$choices[[1]]$message$content)
    chatHistory <<- append(chatHistory, list(list(role = "assistant", content = answer)))

  # return
  return(answer)
  
}

chatHistory <- list()
cat(chatGPT("2+2"))
cat(chatGPT("square of it"))
cat(chatGPT("add 3 to result"))

Output

cat(chatGPT("2+2"))
# 4
cat(chatGPT("square of it"))
# The square of 4 is 16.
cat(chatGPT("add 3 to result"))
# Adding 3 to the result of 16 gives 19.

Important Points

It is important to create list as shown above. Name of list must be chatHistory
max_tokens refers to the maximum number of tokens to generate response. top_p refers to refers to the probability threshold used to select the next word from the probable words.

How to Input Images

The latest ChatGPT-4 model called gpt-4o accepts image inputs and returns output in the form of text.


library(httr)
library(jsonlite)

chatGPT_img <- function(prompt, 
                    image_url,
                    modelName = "gpt-4o",
                    detail = "low",
                    apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", apiKey)),
    content_type_json(),
    encode = "json",
    body = list(
      model = modelName,
      messages = list(
        list(
          role = "user",
          content = list(
            list(type = "text", text = "prompt"),
            list(
              type = "image_url",
              image_url = list(url = image_url, detail = detail)
            )
          )
        )
      )
    )
  )
  
  if(status_code(response)>200) {
    stop(content(response))
  }
  
  trimws(content(response)$choices[[1]]$message$content)
}

cat(chatGPT_img(prompt = "What's in the image?",
                image_url = "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCGchJj9jVRP0jMND1a6tJXj7RcYWtnCO4J6YcbPTXrNxiCvs_3NSk7h2gB0h2sc_6bTvwPrBeBHwUA45AXAhaw1uuINuPDcHCbARxpgJIXM5Spi_0P45aR6tqZ_yof-YlNn41LhzHjfW-wsV3mhxBug4To8xtgyMzsHLbm3XoaHZmYUdNY1YWJA5rh6cB/s1600/Soccer-1490541_960_720.jpg"))

Output

The image shows a soccer match between two teams, with players wearing different colored uniforms. The player wearing number 16 in a green uniform is attempting to compete for the ball against a player wearing number 2 in a white uniform. There are other players visible in the background, also engaged in the game. The scene takes place on a grassy field, and it appears to be during an evening match. The players are focused, and the moment captures a dynamic motion as they vie for control of the soccer ball.

How to Upload Image from Your Local Device

Suppose you have an image on your local device instead of stored on web. In this case, you need to convert it to base64 image format. Make sure to install base64enc library before running the code below.


library(httr)
library(jsonlite)
library(base64enc)

base64_image <- function(image_path) {
  image_data <- readBin(image_path, "raw", file.info(image_path)$size)
  encoded_image <- base64enc::base64encode(image_data)
  return(encoded_image)
}

chatGPT_img <- function(prompt, 
                    image_path,
                    modelName = "gpt-4o",
                    detail = "low",
                    apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  base64_img = base64_image(image_path)
  base64_img = paste0("data:image/jpeg;base64,",base64_img)
  
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", apiKey)),
    content_type_json(),
    encode = "json",
    body = list(
      model = modelName,
      messages = list(
        list(
          role = "user",
          content = list(
            list(type = "text", text = "prompt"),
            list(
              type = "image_url",
              image_url = list(url = base64_img, detail = detail)
            )
          )
        )
      )
    )
  )
  
  if(status_code(response)>200) {
    stop(content(response))
  }
  
  trimws(content(response)$choices[[1]]$message$content)
}

cat(chatGPT_img(prompt = "What's in the image?",
                image_path = "C:\\Users\\deepa\\Downloads\\Soccer-1490541_960_720.jpg"))

R function to generate image

Like GPT for text generation, OpenAI has a model called DALL-E to generate or edit image. DALL-E can create highly realistic images that have never clicked before in real-world, based purely on your prompt. It can be used for for various purposes like social media marketing, image for blog post etc. In the code below, it will take your instruction (prompt) as input and create image accordingly.

chatGPT_image <- function(prompt, 
                    n = 1,
                    size = c("1024x1024", "256x256", "512x512"),
                    response_format = c("url", "b64_json"),
                    apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  size <- match.arg(size)
  response_format <- match.arg(response_format)
  
  response <- POST(
    url = "https://api.openai.com/v1/images/generations", 
    add_headers(Authorization = paste("Bearer", apiKey)),
    content_type_json(),
    encode = "json",
    body = list(
      prompt = prompt,
      n = n,
      size =  size,
      response_format = response_format
    )
  )
  
  if(status_code(response)>200) {
    stop(content(response))
  }
  
  parsed0 <- httr::content(response, as = "text", encoding = "UTF-8")
  parsed  <- jsonlite::fromJSON(parsed0, flatten = TRUE)
  parsed
  
}

img <- chatGPT_image("saint sitting on wall street")
img$data$url

The above code returns URL of the generated image which you can paste it to browser (Google Chrome/ Edge) and can see the generated image. To see the image in RStudio, refer the code below.

library(magick)
saint <- image_read(img$data$url)
print(saint)

Important Points

Explanation of model arguments of DALL-E are as follows :

n : Number of images to generate
size : Image size
response_format : Do you want image in the format of URL or base64 image string?

How to validate API Key

The function below can be used as an utility to check if API key is correct or not. It may be useful incase you are building an application and want to validate API Key before user starts asking question in the interface.

apiCheck <- function(apiKey = Sys.getenv("chatGPT_API_KEY")) {
  
  if(nchar(apiKey)<1) {
    apiKey <- readline("Paste your API key here: ")
    Sys.setenv(chatGPT_API_KEY = apiKey)
  }
  
  x <- httr::GET(
    "https://api.openai.com/v1/models",
    httr::add_headers(Authorization = paste0("Bearer ", apiKey))
  )
  
  status <- httr::status_code(x)
  if (status == 200) {
    message("Correct API Key. Yeeee!")
  } else {
    stop("Incorrect API Key. Oops!")
  }
  
}

apiCheck()

RStudio Add-in for ChatGPT

To have interactive shiny app like ChatGPT website, you can use RStudio add-in for ChatGPT by installing gptstudio package. To install the package, run this command install.packages("gptstudio")

gptstudio:::addin_chatgpt()

In the shiny app, you can also select your programming style and proficiency level.

Shiny App for ChatGPT

If you want to build your own ChatGPT clone in shiny, you can visit this tutorial - ChatGPT clone in Shiny. It will help you to build your own customised chatbot for your audience.

ChatGPT prompts for R

Following is a list of some standard ChatGPT prompts you can use for R coding. In case you only want R code as output and do not want explanation of code from ChatGPT, you can use this line in prompt Do not write explanations on replies.

Explain the following code [Insert code]
The following code is poorly written. Can you optimise it? [Insert code]
Can you simplify the following code? [Insert code]
Can you please convert the following code from Python to R? [Insert code]
I have a dataset of [describe dataset]. Please write R code for data exploration.

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn