In this tutorial, you will learn how to run ChatGPT in R, a popular programming language used by many data scientists. We will be talking about OpenAI API which can be called and used to run ChatGPT within R. OpenAI has official documentation for Python and Node.js library but R users need not feel let down by the lack of official documentation on ChatGPT API as this tutorial will now provide them with the necessary information.

What is ChatGPT?
Most of us are already familiar with ChatGPT, so it might not need further introduction. ChatGPT is a smart chatbot that provides responses like human. It understands your query like a human and provides response accordingly. It consists of two words - "Chat" and "GPT". GPT refers to Generative Pre-trained Transformer architecture for natural language processing. It is trained on a large corpus of text data and can generate responses to user inputs in a conversational manner, making it useful for various applications such as customer support, language learning, and entertainment.
Terminologies related to ChatGPT
It is important to understand some terminologies related to ChatGPT, because it decides how much you pay and how you use ChatGPT.
In simple words, prompt means a question you want to ask to ChatGPT. It is also called search query. Think like this - you have a very smart machine which can answer anything. You can ask it to write an essay, a programming code, or anything else you can think of. But the machine requires specific instruction from you on what exactly you want them to do. Hence it is important that prompt should be clear and specific towards the response you wish
Tokens are subwords or words. See some examples below
- lower splits into two tokens: "low" and "er"
- smartest splits into two tokens: "smart" and "est"
- unhappy splits into two tokens: "un" and "happy"
- transformer splits into three tokens: "trans", "form", "er"
- bear is a single token
If you noticed words are split into tokens because they can have different suffix and prefix. "Low" can be lower or lowest so it is important to make the model understand that these words are related.
These tokens decide your usage and billing. OpenAI Team say you can estimate roughly that one token is about four letters for English text. But in reality this varies a lot.
In the previous section you understand what token means. Now it is essential to know the different types of tokens in the world of ChatGPT.
- Prompt Tokens: Number of tokens used in your prompt (question)
- Completion Tokens: Number of tokens used in writing response (answer/output)
Total Tokens = Prompt Tokens + Completion Tokens
If you use GPT-3.5 model in API, you will have to pay $0.002 per 1000 tokens. If you pay GPT-4, you have to pay more and pricing structure is a bit complex. See the table below. The context size refers to the number of tokens that the model considers when generating a response. For example, if you ask GPT-4 to read an online book before answering the question. It's like setting the context.
Model | Prompt | Completion |
---|---|---|
8K context | $0.03 / 1K tokens | $0.06 / 1K tokens |
32K context | $0.06 / 1K tokens | $0.12 / 1K tokens |
Steps to run ChatGPT in R
You can sign up for an account on OpenAI's platform by visiting platform.openai.com. Once you’re there, you can create an account using your Google or Microsoft email address. After creating your account, the most important step is to get a secret API key to access the API. Once you have your API key, store it for future reference.
sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Before we can start using ChatGPT in R, we need to install the necessary libraries. The two libraries we will be using are httr
and jsonlite
. The "httr" library allows us to post our question and fetch response with OpenAI API, while the "jsonlite" library helps to convert R object to JSON format.
install.packages("httr") install.packages("jsonlite")
apiKey
and prompt
. First one refers to the OpenAI API Key you generated in the previous step. Second one refers to the question you want to ask to ChatGPT.
library(httr) library(jsonlite) apiKey <- "sk-xxxxxxxxxxxxxxxx" prompt <- "R code to remove duplicates using dplyr. Do not write explanations on replies." response <- POST( url = "https://api.openai.com/v1/chat/completions", add_headers(Authorization = paste("Bearer", apiKey)), content_type_json(), encode = "json", body = list( model = "gpt-3.5-turbo", temperature = 1, messages = list(list( role = "user", content = prompt )) ) ) content(response)
$id [1] "chatcmpl-7DaAPWmVVc3f9VA5FKWTzeKMWSyii" $object [1] "chat.completion" $created [1] 1683471645 $model [1] "gpt-3.5-turbo-0301" $usage $usage$prompt_tokens [1] 25 $usage$completion_tokens [1] 5 $usage$total_tokens [1] 30 $choices $choices[[1]] $choices[[1]]$message $choices[[1]]$message$role [1] "assistant" $choices[[1]]$message$content [1] "df %>% distinct()" $choices[[1]]$finish_reason [1] "stop" $choices[[1]]$index [1] 0
cat(content(response)$choices[[1]]$message$content)
Run the code above to generate output in more presentable manner.
df %>% distinct()
- GPT-4 : To use GPT-4, mention
gpt-4
instead ofgpt-3.5-turbo
in the code above - In OpenAI's API, the
temperature
argument is used to control the creativity or randomness of the generated text. It lies between 0 and 2. A higher temperature value will make the model more likely to generate more surprising and unexpected responses, whereas a lower temperature value will make the model more conservative and predictable. For example, if the temperature is set to 0.5, the generated text will be more focused, whereas if the temperature is set to 1.5, the generated text will be more random.
R Function for ChatGPT
Here we are creating user defined function in R for ChatGPT which is a robust method of calling ChatGPT in R. It wraps the R code shown in the previous section of this article in function and allows flexibility to user to change arguments of model easily.
chatGPT <- function(prompt, modelName = "gpt-3.5-turbo", temperature = 1, apiKey = Sys.getenv("chatGPT_API_KEY")) { if(nchar(apiKey)<1) { apiKey <- readline("Paste your API key here: ") Sys.setenv(chatGPT_API_KEY = apiKey) } response <- POST( url = "https://api.openai.com/v1/chat/completions", add_headers(Authorization = paste("Bearer", apiKey)), content_type_json(), encode = "json", body = list( model = modelName, temperature = temperature, messages = list(list( role = "user", content = prompt )) ) ) if(status_code(response)>200) { stop(content(response)) } trimws(content(response)$choices[[1]]$message$content) } cat(chatGPT("square of 29"))
When you run the function above first time, it will ask you to enter your API Key. It will save the API Key in chatGPT_API_KEY
environment variable so it won't ask for API Key when you run the function next time. Sys.setenv( ) is to store API Key whereas Sys.getenv( ) is to pull the stored API Key.
Sys.setenv(chatGPT_API_KEY = "APIKey") # Set API Key Sys.getenv("chatGPT_API_KEY") # Get API Key
How to customize ChatGPT in R
By setting the system
role you can control the behavior of ChatGPT. It is useful to provide context to ChatGPT before starting the conversation. It can also be used to set the tone of the conversation.
For example if you want ChatGPT to be funny. To make these changes in R, you can add one more list in the messages portion of the code and the remaining code will remain as it is as shown in the previous section of the article.
In the code below, we are telling ChatGPT to act like a Chief Purchasing Officer of an automotive company. Students will ask domain specific questions related to the company/industry.
messages = list( list( "role" = "system", "content" = "You are John Smith, the Chief Purchasing Officer of Surya Motors. Your company operates as per Toyota Production System. You are being interviewed by students" ), list(role = "user", content = "what are your roles and responsibilities?") )
R Function to allow ChatGPT to Remember Prior Conversations
By default, OpenAI's API doesn't remember about previous questions in order to answer subsequent questions. This means that if you asked a question like "What is 2+2?" and then followed up with "What is the square of the previous answer?", it wouldn't be able to provide response as it does not recall previous prompt.
You must be wondering this functionality is already there in the ChatGPT website. Yes this functionality exists in the website but not with OpenAI API. To improve ChatGPT's ability to remember previous conversations, you can use the following R function.
chatGPT <- function(prompt, modelName = "gpt-3.5-turbo", temperature = 1, max_tokens = 2048, top_p = 1, apiKey = Sys.getenv("chatGPT_API_KEY")) { # Parameters params <- list( model = modelName, temperature = temperature, max_tokens = max_tokens, top_p = top_p ) if(nchar(apiKey)<1) { apiKey <- readline("Paste your API key here: ") Sys.setenv(chatGPT_API_KEY = apiKey) } # Add the new message to the chat session messages chatHistory <<- append(chatHistory, list(list(role = "user", content = prompt))) response <- POST( url = "https://api.openai.com/v1/chat/completions", add_headers("Authorization" = paste("Bearer", apiKey)), content_type_json(), body = toJSON(c(params, list(messages = chatHistory)), auto_unbox = TRUE) ) if (response$status_code > 200) { stop(content(response)) } response <- content(response) answer <- trimws(response$choices[[1]]$message$content) chatHistory <<- append(chatHistory, list(list(role = "assistant", content = answer))) # return return(answer) }
chatHistory <- list() cat(chatGPT("2+2")) cat(chatGPT("square of it")) cat(chatGPT("add 3 to result"))Output
> chatHistory <- list() > cat(chatGPT("2+2")) 4 > cat(chatGPT("square of it")) The square of 4 is 16. > cat(chatGPT("add 3 to result")) Adding 3 to the result of 16 gives 19.
- chatHistory : It is important to create list as shown above. Name of list must be
chatHistory
- max_tokens refers to the maximum number of tokens to generate response. top_p refers to refers to the probability threshold used in the "nucleus sampling" algorithm. Nucleus sampling is a type of probabilistic text generation where the algorithm selects the next word from a restricted set of the most probable words, based on their cumulative probability.
R Function to generate image
Like GPT for text generation, OpenAI has a model called DALL-E to generate or edit image. DALL-E can create highly realistic images that have never clicked before in real-world, based purely on your prompt. It can be used for for various purposes like social media marketing, image for blog post etc. In the code below, it will take your instruction (prompt) as input and create image accordingly.
chatGPT_image <- function(prompt, n = 1, size = c("1024x1024", "256x256", "512x512"), response_format = c("url", "b64_json"), apiKey = Sys.getenv("chatGPT_API_KEY")) { if(nchar(apiKey)<1) { apiKey <- readline("Paste your API key here: ") Sys.setenv(chatGPT_API_KEY = apiKey) } size <- match.arg(size) response_format <- match.arg(response_format) response <- POST( url = "https://api.openai.com/v1/images/generations", add_headers(Authorization = paste("Bearer", apiKey)), content_type_json(), encode = "json", body = list( prompt = prompt, n = n, size = size, response_format = response_format ) ) if(status_code(response)>200) { stop(content(response)) } parsed0 <- httr::content(response, as = "text", encoding = "UTF-8") parsed <- jsonlite::fromJSON(parsed0, flatten = TRUE) parsed } img <- chatGPT_image("saint sitting on wall street") img$data$urlThe above code returns URL of the generated image which you can paste it to browser (Google Chrome/ Edge) and can see the generated image. To see the image in RStudio, refer the code below.
library(magick) saint <- image_read(img$data$url) print(saint)
- n : Number of images to generate
- size : Image size
- response_format : Do you want image in the format of URL or base64 image string?
How to validate API Key
The function below can be used as an utility to check if API key is correct or not. It may be useful incase you are building an application and want to validate API Key before user starts asking question in the interface.
apiCheck <- function(apiKey = Sys.getenv("chatGPT_API_KEY")) { if(nchar(apiKey)<1) { apiKey <- readline("Paste your API key here: ") Sys.setenv(chatGPT_API_KEY = apiKey) } x <- httr::GET( "https://api.openai.com/v1/models", httr::add_headers(Authorization = paste0("Bearer ", apiKey)) ) status <- httr::status_code(x) if (status == 200) { message("Correct API Key. Yeeee!") } else { stop("Incorrect API Key. Oops!") } } apiCheck()
RStudio Add-in for ChatGPT
To have interactive shiny app like ChatGPT website, you can use RStudio add-in for ChatGPT by installing gptstudio package. To install the package, run this command install.packages("gptstudio")
gptstudio:::addin_chatgpt()In the shiny app, you can also select your programming style and proficiency level.

Shiny App for ChatGPT
If you want to build your own ChatGPT clone in shiny, you can visit this tutorial - ChatGPT clone in Shiny. It will help you to build your own customised chatbot for your audience.
ChatGPT prompts for R
Following is a list of some standard ChatGPT prompts you can use for R coding. In case you only want R code as output and do not want explanation of code from ChatGPT, you can use this line in prompt Do not write explanations on replies
.
- Prompt: Explain the following code [Insert code]
- Prompt: The following code is poorly written. Can you optimise it? [Insert code]
- Prompt: Can you simplify the following code? [Insert code]
- Prompt: Can you please convert the following code from Python to R? [Insert code]
- Prompt: I have a dataset of [describe dataset]. Please write R code for data exploration.
Post a Comment