We have a new AI tool in the market called Transformers Agent
which is so powerful that it can automate just about any task you can think of. It can generate and edit images, video, audio, answer questions about documents, convert speech to text and do a lot of other things.
Hugging Face, a well-known name in the open-source AI world, released Transformers Agent that provides a natural language API on top of transformers. The API is designed to be easy to use. With a single line code, it provides a variety of tools for performing natural language tasks, such as question answering, image generation, video generation, text to speech, text classification, and summarization.
How does Transformers Agent work?
Let's understand these two terms Transformers
and Agent
.
Transformers are models used for natural language processing (NLP) tasks. Let's say a chatbot that helps users book flights. When a user types in a query like "I want to book a flight from New York to San Francisco on June 15th," the chatbot's transformer model will break down the input text into a sequence of tokens, such as "book", "flight", "New York", "San Francisco", and "June 15th".
The transformer will then use self-attention to analyze each token in the sequence and determine its relevance to the overall meaning of the query. For instance, it might pay more attention to the "New York" and "San Francisco" tokens to identify the user's departure and destination cities.
Once the self-attention step is complete, the transformer will generate a response based on the input sequence. In this case, it might respond with flight options that match the user's query, such as "Here are some flights from New York to San Francisco on June 15th."
In layman's term, the term Agent in Transformers Agent refers to a computer program that uses Transformers to perform tasks. Here computer program is a large language model. In the example of flight booking, Transformers Agent fetches flight schedules and prices. It allows developers to provide the language model with a description of the task they want, such as finding available flights between two cities on a specific date.
Tools are functions which are used to generate final output depending on the prompt. For example it generates image if prompt is about drawing picture about something. See the list of some of tools that are run at backend.
Function Name | Description |
---|---|
image_generator | Generates images based on a text prompt. |
image_captioner | Generates captions for images. |
image_transformer | Transforms images such as resizing, cropping, and rotating. |
classifier | Classifies text into predefined categories. |
translator | Translates text from one language to another. |
speaker | Reads text aloud. |
summarizer | Summarizes a long piece of text into a shorter, more concise version. |
transcriber | Converts speech to text. |
text_qa | Answers questions about text. |
text_downloader | Downloads text from the internet. |
image_qa | Answers questions about images. |
video_generator | Generates videos based on a text prompt. |
document_qa | Answers questions about documents. |
image_segmenter | Segment images into their parts. |
Benefits of Transformers Agent
Some of the benefits of using the Transformers Agent API are as follows.
- Transformers Agent API is easy to use. It provides a high-level interface that hides the complexity of transformers.
- It is efficient which means it can be used to perform natural language tasks at scale.
- It can be easily extended to use new transformer models or parameters.
- It has several use cases such as in the fields of customer service, marketing, sales, and research.
How to run Transformers Agent
You can use my Google Colab Notebook to explore Transformers Agent. Click on the link below to access it.
To get started with the Transformers Agent API, you will need to install the required libraries - transformers
openai
accelerate
diffusers
!pip install transformers openai accelerate diffusers -q
import transformers
Once transformers librart is installed and loaded, check version of transformers library and make sure it is 4.29 or later.
print(transformers.__version__)
First, you need to create an agent. An agent is essentially a large language model. It can be OpenAI model, StarCoder model or OpenAssistant model.
To use the OpenAI model, you will need an OpenAI API key. It is not available for free but the cost of OpenAI API is very minimal depending on the number of tokens (words) you use. On the other hand, the StarCoder model and the OpenAssistant model can be loaded from the HuggingFace Hub. Using the HuggingFace Hub is free, but you will need a HuggingFace Hub API key.
OpenAI
import openai import os os.environ['OPENAI_API_KEY'] = "sk-xxxxxxxxxxxxx" from transformers import OpenAiAgent agent = OpenAiAgent(model="gpt-3.5-turbo")
Starcoder
from huggingface_hub import login login("YOUR_TOKEN") from transformers import HfAgent agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
OpenAssistant
from huggingface_hub import login login("YOUR_TOKEN") from transformers import HfAgent agent = HfAgent(url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")
agent.run is a single execution method and selects the tool for the task automatically, e.g., select the image generator tool to create an image.
agent.run("Draw me a picture of person sitting outside river.")
If you want to see the tool that is being used in generating the final result, you can use the argument return_code = True
.
agent.run("Draw me a picture of person sitting outside river", return_code = True)
==Explanation from the agent== I will use the following tool: `image_generator` to generate an image according to the prompt. ==Code generated by the agent== image = image_generator(prompt="person sitting outside river") from transformers import load_tool image_generator = load_tool("huggingface-tools/text-to-image") image = image_generator(prompt="person sitting outside river")
The difference between .run
and .chat
are as follows:
- .run does not remember prior chat conversation but performs better for running multiple tools in a row from a given instruction.
- .chat keeps chat history which means it remembers prior chats.
agent.chat("Draw me a picture of saint sitting outside river")
picture=
option, you can update previously generated image.
picture = agent.run("Generate a picture of rivers and lakes.") updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)
In the example below, we are converting text to speech.
audio = agent.run("Read out loud the summary of [URL]") play_audio(audio)
Let's take another example in which we are asking agent to run multiple operations - first generate image and then caption it. Once done, then convert text to speech.
audio = agent.run("Can you generate an image of a boat? Please read out loud the contents of the image afterwards") play_audio(audio)
Difference between Transformers and LangChain Agent
Both the Transformers Agent and the LangChain Agent allow for the creation of custom agents, and they both utilize Python files to represent each tool as a class. While they share similarities in terms of objectives, it's important to be aware of the few differences between them before using them.
- Stability : The Transformers Agent is still in the experimental phase and has a more limited scope and flexibility compared to the LangChain Agent.
- Tools : The Transformers Agent offers a variety of tools powered by Transformer models, enabling multimodal capabilities and specialized models for specific tasks. It can interact with over 100,000 Hugging Face models. Whereas the LangChain Agent uses external APIs for its tools, but it also supports Hugging Face Tools integration.
- Code Execution : The Transformers Agent includes code-execution as a step after selecting tools, focusing on executing Python code specifically whereas the LangChain Agent includes "code-execution" as one of its tools, providing more flexibility in defining the desired task goal beyond just executing Python code.
- Framework : The Transformers Agent employs a prompt template to determine the appropriate tool based on its description and provides explanations and few-shot learning examples. Whereas the LangChain Agent uses the ReAct framework to determine the tool and provides similar thought processes and reasoning as the Transformers Agent.
If you are looking for an efficient way to handle various natural language tasks, we have great news for you: the Transformers Agent API is now available. This powerful AI tool is specifically designed to handle a broad spectrum of natural language processing tasks. What sets it apart is not just its user-friendly nature, but also its exceptional extensibility and performance. It is important to note that the API is currently in an experimental phase and subject to potential changes. However, it holds the promise of even greater robustness and new features in the future.
Share Share Tweet