Building with OpenAI's Assistants API

OpenAI’s Assistants API, announced at DevDay, provides managed infrastructure for building AI agents. Threads, code interpreter, and retrieval are handled for you. But like any abstraction, it has trade-offs. Here’s how to use it effectively.

Assistants API Architecture

Core Concepts

assistants_api_concepts:
  assistant:
    what: Configuration for an AI agent
    includes:
      - Model selection
      - Instructions (system prompt)
      - Tools (code_interpreter, retrieval, functions)
      - Files for retrieval

  thread:
    what: A conversation session
    includes:
      - Message history
      - State management
      - File attachments

  run:
    what: Execution of assistant on a thread
    includes:
      - Processing the conversation
      - Tool execution
      - Response generation

  message:
    what: A single turn in conversation
    includes:
      - Role (user/assistant)
      - Content
      - File attachments

Architecture Comparison

traditional_vs_assistants:
  traditional_approach:
    - Manage conversation history yourself
    - Build/integrate vector database
    - Implement code execution sandbox
    - Handle state persistence

  assistants_api:
    - Thread manages history
    - Retrieval tool handles RAG
    - Code interpreter runs Python
    - State managed by OpenAI

Implementation Patterns

Basic Assistant

from openai import OpenAI

client = OpenAI()

# Create an assistant
assistant = client.beta.assistants.create(
    name="Code Helper",
    instructions="""You are a helpful coding assistant.
    When asked about code, provide clear explanations and examples.
    Use the code interpreter to demonstrate concepts when helpful.""",
    model="gpt-4-1106-preview",
    tools=[{"type": "code_interpreter"}]
)

# Create a thread for a conversation
thread = client.beta.threads.create()

# Add a user message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Write a Python function to find prime numbers up to n"
)

# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Wait for completion
while run.status not in ["completed", "failed"]:
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )
    time.sleep(1)

# Get the response
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages:
    if msg.role == "assistant":
        print(msg.content[0].text.value)

With Retrieval

# Upload a file for retrieval
file = client.files.create(
    file=open("documentation.pdf", "rb"),
    purpose="assistants"
)

# Create assistant with retrieval
assistant = client.beta.assistants.create(
    name="Documentation Assistant",
    instructions="""You help users understand our product documentation.
    Always cite specific sections when answering.
    If information isn't in the docs, say so.""",
    model="gpt-4-1106-preview",
    tools=[{"type": "retrieval"}],
    file_ids=[file.id]
)

# Now the assistant can answer questions about the uploaded docs

With Function Calling

# Define functions the assistant can call
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

assistant = client.beta.assistants.create(
    name="Weather Assistant",
    instructions="Help users with weather information.",
    model="gpt-4-1106-preview",
    tools=tools
)

# Handle function calls in the run
def process_run(thread_id, run_id):
    while True:
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run_id
        )

        if run.status == "completed":
            return run

        if run.status == "requires_action":
            # Handle function calls
            tool_outputs = []
            for tool_call in run.required_action.submit_tool_outputs.tool_calls:
                if tool_call.function.name == "get_weather":
                    args = json.loads(tool_call.function.arguments)
                    result = get_weather(args["location"])
                    tool_outputs.append({
                        "tool_call_id": tool_call.id,
                        "output": json.dumps(result)
                    })

            # Submit the results
            client.beta.threads.runs.submit_tool_outputs(
                thread_id=thread_id,
                run_id=run_id,
                tool_outputs=tool_outputs
            )

        time.sleep(1)

Production Considerations

Error Handling

class AssistantService:
    def __init__(self, assistant_id: str):
        self.client = OpenAI()
        self.assistant_id = assistant_id

    def run_conversation(self, thread_id: str, message: str, timeout: int = 60):
        try:
            # Add message
            self.client.beta.threads.messages.create(
                thread_id=thread_id,
                role="user",
                content=message
            )

            # Create run
            run = self.client.beta.threads.runs.create(
                thread_id=thread_id,
                assistant_id=self.assistant_id
            )

            # Wait with timeout
            start = time.time()
            while run.status not in ["completed", "failed", "cancelled"]:
                if time.time() - start > timeout:
                    # Cancel the run
                    self.client.beta.threads.runs.cancel(
                        thread_id=thread_id,
                        run_id=run.id
                    )
                    raise TimeoutError("Run timed out")

                run = self.client.beta.threads.runs.retrieve(
                    thread_id=thread_id,
                    run_id=run.id
                )
                time.sleep(1)

            if run.status == "failed":
                raise RuntimeError(f"Run failed: {run.last_error}")

            return self._get_assistant_response(thread_id)

        except openai.APIError as e:
            logger.error(f"API error: {e}")
            raise

Thread Management

class ThreadManager:
    def __init__(self, redis_client):
        self.client = OpenAI()
        self.redis = redis_client
        self.ttl = 86400 * 7  # 7 days

    def get_or_create_thread(self, user_id: str) -> str:
        key = f"thread:{user_id}"

        # Check for existing thread
        thread_id = self.redis.get(key)
        if thread_id:
            return thread_id.decode()

        # Create new thread
        thread = self.client.beta.threads.create()
        self.redis.setex(key, self.ttl, thread.id)
        return thread.id

    def reset_thread(self, user_id: str) -> str:
        key = f"thread:{user_id}"
        self.redis.delete(key)
        return self.get_or_create_thread(user_id)

Trade-offs

When to Use Assistants API

use_assistants_api:
  good_fit:
    - Chatbots with conversation history
    - Document Q&A applications
    - Code execution features
    - Rapid prototyping

  not_ideal:
    - Custom retrieval requirements
    - Multi-model routing
    - Fine-grained cost control
    - Data sovereignty requirements
    - Complex agent orchestration

Limitations

assistants_api_limitations:
  retrieval:
    - Limited control over chunking
    - No custom embeddings
    - File size limits
    - Can't inspect what was retrieved

  code_interpreter:
    - Execution time limits
    - Package limitations
    - No persistent state between runs
    - Limited compute resources

  general:
    - Vendor lock-in
    - Limited observability
    - Pricing per run duration
    - Beta status (API may change)

Key Takeaways

Assistants API handles threads, retrieval, and code execution
Significant reduction in infrastructure needs
Good for chatbots, document Q&A, code execution features
Trade control for convenience
Implement proper error handling and timeouts
Manage threads per user appropriately
Understand limitations: retrieval control, observability, lock-in
Consider hybrid: Assistants for simple, custom for complex
Beta API—expect changes

Assistants API is a good default. Know when to go custom.