Nov 5, 2024

•

Eric Ciarla imageEric Ciarla

Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses

Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses image

Leveraging the full potential of Large Language Models (LLMs) often involves balancing between response accuracy and latency. OpenAI’s new Predicted Outputs feature introduces a way to significantly reduce response times by informing the model about the expected output in advance.

In this article, we’ll explore how to use Predicted Outputs with the GPT-4o and GPT-4o-mini models to make your AI applications super fast 🚀. We’ll also provide a practical example of transforming blog posts into SEO-optimized content, a powerful use case enabled by this feature.

What Are Predicted Outputs?

Predicted Outputs allow you to provide the LLM with an anticipated output, especially useful when most of the response is known ahead of time. For tasks like rewriting text with minor modifications, this can drastically reduce the time it takes for the model to generate the desired result.

Why Use Predicted Outputs?

By supplying the model with a prediction of the output, you:

  • Reduce Latency: The model can process and generate responses faster because it doesn’t need to generate the entire output from scratch.
  • Enhance Efficiency: Useful when you can reasonably assume that large portions of the output will remain unchanged.

Limitations to Keep in Mind

While Predicted Outputs are powerful, there are some limitations:

  • Supported only with GPT-4o and GPT-4o-mini models.
  • Certain API parameters are not supported, such as n values greater than 1, logprobs, presence_penalty greater than 0, among others.

How to Use Predicted Outputs

Let’s dive into how you can implement Predicted Outputs in your application. We’ll walk through an example where we optimize a blog post by adding internal links to relevant pages within the same website.

Prerequisites

Make sure you have the following installed:

pip install firecrawl-py openai

Step 1: Set Up Your Environment

Initialize the necessary libraries and load your API keys.

import os
import json
from firecrawl import FirecrawlApp
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Retrieve API keys from environment variables
firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialize the FirecrawlApp and OpenAI client
app = FirecrawlApp(api_key=firecrawl_api_key)
client = OpenAI(api_key=openai_api_key)

Step 2: Scrape the Blog Content

We’ll start by scraping the content of a blog post that we want to optimize.

# Get the blog URL (you can input your own)
blog_url = "https://www.firecrawl.dev/blog/how-to-use-openai-o1-reasoning-models-in-applications"

# Scrape the blog content in markdown format
blog_scrape_result = app.scrape_url(blog_url, params={'formats': ['markdown']})
blog_content = blog_scrape_result.get('markdown', '')

Step 3: Map the Website for Internal Links

Next, we’ll get a list of other pages on the website to which we can add internal links.

# Extract the top-level domain
top_level_domain = '/'.join(blog_url.split('/')[:3])

# Map the website to get all internal links
site_map = app.map_url(top_level_domain)
site_links = site_map.get('links', [])

Step 4: Prepare the Prompt and Prediction

We’ll create a prompt instructing the model to add internal links to the blog post and provide the original content as a prediction.

prompt = f"""
You are an AI assistant helping to improve a blog post.

Here is the original blog post content:

{blog_content}

Here is a list of other pages on the website:

{json.dumps(site_links, indent=2)}

Please revise the blog post to include internal links to some of these pages where appropriate. Make sure the internal links are relevant and enhance the content.

Only return the revised blog post in markdown format.
"""

Step 5: Use Predicted Outputs with the OpenAI API

Now, we’ll call the OpenAI API using the prediction parameter to provide the existing content.

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ],
    prediction={
        "type": "content",
        "content": blog_content
    }
)
revised_blog_post = completion.choices[0].message.content

Step 6: Compare the Original and Revised Content

Finally, we’ll compare the number of links in the original and revised blog posts to see the improvements.

import re

def count_links(markdown_content):
    return len(re.findall(r'\[.*?\]\(.*?\)', markdown_content))

original_links_count = count_links(blog_content)
revised_links_count = count_links(revised_blog_post)

print(f"Number of links in the original blog post: {original_links_count}")
print(f"Number of links in the revised blog post: {revised_links_count}")

Conclusion

By utilizing Predicted Outputs, you can significantly speed up tasks where most of the output is known, such as content reformatting or minor edits. This feature is a game-changer for developers looking to optimize performance without compromising on the quality of the output.

That’s it! In this article, we’ve shown you how to get started with Predicted Outputs using OpenAI’s GPT-4o models. Whether you’re transforming content, correcting errors, or making minor adjustments, Predicted Outputs can make your AI applications faster and more efficient.

References

Ready to Build?

Start scraping web data for your AI apps today.
No credit card needed.

About the Author

Eric Ciarla image
Eric Ciarla@ericciarla

Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.

More articles by Eric Ciarla