What’s an OpenAI Assistant?
Think of it as a software glue that affords you to gel together agent-like capabilities in your applications to conduct tasks expressed as instructions in natural language to an Assistant.
Able to understand instructions, it can leverage OpenAI’s SOTA models and tools to carry out tasks. With Assistants stateful API, you can create Assistants within your application, providing you access to three types of supported tools: Code Interpreter, Retrieval, and Function calling [5].
At the core it has few concepts and components that cogently interact together, to enable agent-like capabilities.
Assistants API, concepts, components, and tools
Unfortunately, OpenAI documentation falls short in explaining or illustrating these components into finer details and showing how they work together. Randy Michak of Empowerment AI does a fine job of dissecting these core components and illustrating their flow and data interactions [7]. Inspired by Michak, I mildly modified Figure 4, showing dynamic interaction and data flowing among Assistants API components.
To get started with Assistants, the OpenAI guide stipulates four simple steps to use the Assistants API to glue together these core components for coordination [8].
Step 1: Create an Assistant, to declare a custom model and provide instructions for the Assistant. This helps the Assistant to elect the appropriate supported tool to employ.
Step 2: Create a Thread, a stateful session for the Assistant to retrieve messages from and add Assistant messages to.
Step 3: Use the Thread as a conversational session to add messages for the assistants to consume.
Step 4: Run the Assistant on a newly added Thread message to trigger responses. This run is Assistant’s asynchronous runtime environment.
How does it all work together?
Let’s methodically walk through a simple example where we want to accomplish the following:
Integrate Assistants API, using Retrieval tool, to a) upload a couple of pdf documents and b) use an Assistant to query the contents of the document. Consider this as a mini Retrieval Augmented Generation (RAG) application.
Use Files objects to upload the pdf files so that the Assistant can access them.
Create and employ the Assistant, Threads, Messages, and Run objects to query the uploaded pdf documents.
Coordinate all these concrete objects to interact and interplay together as part of my application.
Step 1: Create File objects as our knowledge base
Upload your PDFs in the retrievers’ database, using a File object. The Assistants API breaks them into parts, as chunks, and saves them, as indexes and vector embeddings. When you ask a question, Retrievers find the best matches and help the Assistant give you a detailed answer, just like a big RAG retriever.
Step 2: Create an Assistant object.
To use an Assistant and conduct tasks, first, create an AI Assistant object. Supply the Assistant with a model, instructional behavior, tools to use, and file IDs to employ for its knowledge base, as parameters.