Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses
Leveraging the full potential of Large Language Models (LLMs) often involves balancing between response accuracy and latency. OpenAI’s new Predicted Outputs feature introduces a way to significantly reduce response times by informing the model about the expected output in advance.
In this article, we’ll explore how to use Predicted Outputs with the GPT-4o and GPT-4o-mini models to make your AI applications super fast 🚀. We’ll also provide a practical example of transforming blog posts into SEO-optimized content, a powerful use case enabled by this feature.
What Are Predicted Outputs?
Predicted Outputs allow you to provide the LLM with an anticipated output, especially useful when most of the response is known ahead of time. For tasks like rewriting text with minor modifications, this can drastically reduce the time it takes for the model to generate the desired result.
Why Use Predicted Outputs?
By supplying the model with a prediction of the output, you:
- Reduce Latency: The model can process and generate responses faster because it doesn’t need to generate the entire output from scratch.
- Enhance Efficiency: Useful when you can reasonably assume that large portions of the output will remain unchanged.
Limitations to Keep in Mind
While Predicted Outputs are powerful, there are some limitations:
- Supported only with GPT-4o and GPT-4o-mini models.
- Certain API parameters are not supported, such as
n
values greater than 1,logprobs
,presence_penalty
greater than 0, among others.
How to Use Predicted Outputs
Let’s dive into how you can implement Predicted Outputs in your application. We’ll walk through an example where we optimize a blog post by adding internal links to relevant pages within the same website.
Prerequisites
Make sure you have the following installed:
pip install firecrawl-py openai
Step 1: Set Up Your Environment
Initialize the necessary libraries and load your API keys.
import os
import json
from firecrawl import FirecrawlApp
from dotenv import load_dotenv
from openai import OpenAI
# Load environment variables
load_dotenv()
# Retrieve API keys from environment variables
firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
# Initialize the FirecrawlApp and OpenAI client
app = FirecrawlApp(api_key=firecrawl_api_key)
client = OpenAI(api_key=openai_api_key)
Step 2: Scrape the Blog Content
We’ll start by scraping the content of a blog post that we want to optimize.
# Get the blog URL (you can input your own)
blog_url = "https://www.firecrawl.dev/blog/how-to-use-openai-o1-reasoning-models-in-applications"
# Scrape the blog content in markdown format
blog_scrape_result = app.scrape_url(blog_url, params={'formats': ['markdown']})
blog_content = blog_scrape_result.get('markdown', '')
Step 3: Map the Website for Internal Links
Next, we’ll get a list of other pages on the website to which we can add internal links.
# Extract the top-level domain
top_level_domain = '/'.join(blog_url.split('/')[:3])
# Map the website to get all internal links
site_map = app.map_url(top_level_domain)
site_links = site_map.get('links', [])
Step 4: Prepare the Prompt and Prediction
We’ll create a prompt instructing the model to add internal links to the blog post and provide the original content as a prediction.
prompt = f"""
You are an AI assistant helping to improve a blog post.
Here is the original blog post content:
{blog_content}
Here is a list of other pages on the website:
{json.dumps(site_links, indent=2)}
Please revise the blog post to include internal links to some of these pages where appropriate. Make sure the internal links are relevant and enhance the content.
Only return the revised blog post in markdown format.
"""
Step 5: Use Predicted Outputs with the OpenAI API
Now, we’ll call the OpenAI API using the prediction
parameter to provide the existing content.
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": prompt
}
],
prediction={
"type": "content",
"content": blog_content
}
)
revised_blog_post = completion.choices[0].message.content
Step 6: Compare the Original and Revised Content
Finally, we’ll compare the number of links in the original and revised blog posts to see the improvements.
import re
def count_links(markdown_content):
return len(re.findall(r'\[.*?\]\(.*?\)', markdown_content))
original_links_count = count_links(blog_content)
revised_links_count = count_links(revised_blog_post)
print(f"Number of links in the original blog post: {original_links_count}")
print(f"Number of links in the revised blog post: {revised_links_count}")
Conclusion
By utilizing Predicted Outputs, you can significantly speed up tasks where most of the output is known, such as content reformatting or minor edits. This feature is a game-changer for developers looking to optimize performance without compromising on the quality of the output.
That’s it! In this article, we’ve shown you how to get started with Predicted Outputs using OpenAI’s GPT-4o models. Whether you’re transforming content, correcting errors, or making minor adjustments, Predicted Outputs can make your AI applications faster and more efficient.
References
On this page
What Are Predicted Outputs?
Why Use Predicted Outputs?
Limitations to Keep in Mind
How to Use Predicted Outputs
Prerequisites
Step 1: Set Up Your Environment
Step 2: Scrape the Blog Content
Step 3: Map the Website for Internal Links
Step 4: Prepare the Prompt and Prediction
Step 5: Use Predicted Outputs with the OpenAI API
Step 6: Compare the Original and Revised Content
Conclusion
References
About the Author
Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.
More articles by Eric Ciarla
How to Create an llms.txt File for Any Website
Learn how to generate an llms.txt file for any website using the llms.txt Generator and Firecrawl.
Cloudflare Error 1015: How to solve it?
Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request limit set by the website owner.
Build an agent that checks for website contradictions
Using Firecrawl and Claude to scrape your website's data and look for contradictions.
Why Companies Need a Data Strategy for Generative AI
Learn why a well-defined data strategy is essential for building robust, production-ready generative AI systems, and discover practical steps for curation, maintenance, and integration.
Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses
A guide to leveraging Predicted Outputs to speed up LLM tasks with GPT-4o models.
How to easily install requests with pip and python
A tutorial on installing the requests library in Python using various methods, with usage examples and troubleshooting tips
How to quickly install BeautifulSoup with Python
A guide on installing the BeautifulSoup library in Python using various methods, with usage examples and troubleshooting tips
How to Use OpenAI's o1 Reasoning Models in Your Applications
Learn how to harness OpenAI's latest o1 series models for complex reasoning tasks in your apps.