How to Use Prompt Caching and Cache Control with Anthropic Models
Anthropic recently launched prompt caching and cache control in beta, allowing you to cache large context prompts up to 200k tokens and chat with them faster and cheaper than ever before. This is a game changer for Retrieval Augmented Generation (RAG) applications that analyze large amounts of data. Currently caching is only avialable for Sonnet and Haiku but it is coming soon to Opus.
To showcase the power of prompt caching, let’s walk through an example of crawling a website with Firecrawl, caching the contents with Anthropic, and having an AI assistant analyze the copy to provide suggestions for improvement. See the code on Github.
Setup
First, make sure you have API keys for both Anthropic and Firecrawl. Store them securely in a .env
file:
ANTHROPIC_API_KEY=your_anthropic_key
FIRECRAWL_API_KEY=your_firecrawl_key
Install the required Python packages:
pip install python-dotenv anthropic firecrawl requests
Crawling a Website with Firecrawl
Initialize the Firecrawl app with your API key:
app = FirecrawlApp(api_key=firecrawl_api_key)
Crawl a website, limiting the results to 10 pages:
crawl_url = 'https://dify.ai/'
params = {
'crawlOptions': {
'limit': 10
}
}
crawl_result = app.crawl_url(crawl_url, params=params)
Clean up the crawl results by removing the content
field from each entry and save it to a file:
cleaned_crawl_result = [{k: v for k, v in entry.items() if k != 'content'} for entry in crawl_result]
with open('crawl_result.txt', 'w') as file:
file.write(json.dumps(cleaned_crawl_result, indent=4))
Caching the Crawl Data with Anthropic
Load the crawl data into a string:
website_dump = open('crawl_result.txt', 'r').read()
Set up the headers for the Anthropic API request, including the anthropic-beta
header to enable prompt caching:
headers = {
"content-type": "application/json",
"x-api-key": anthropic_api_key,
"anthropic-version": "2023-06-01",
"anthropic-beta": "prompt-caching-2024-07-31"
}
Construct the API request data, adding the website_dump
as an ephemeral cached text:
data = {
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.\n"
},
{
"type": "text",
"text": website_dump,
"cache_control": {"type": "ephemeral"}
}
],
"messages": [
{
"role": "user",
"content": "How can I improve the copy on this website?"
}
]
}
Make the API request and print the response:
response = requests.post(
"https://api.anthropic.com/v1/messages",
headers=headers,
data=json.dumps(data)
)
print(response.json())
The key parts here are:
- Including the
anthropic-beta
header to enable prompt caching - Adding the large
website_dump
text as a cached ephemeral text in thesystem
messages - Asking the assistant to analyze the cached text and provide suggestions
Benefits of Prompt Caching
By caching the large website_dump
text, subsequent API calls can reference that data without needing to resend it each time. This makes conversations much faster and cheaper.
Imagine expanding this to cache an entire knowledge base with up to 200k tokens of data. You can then have highly contextual conversations drawing from that knowledge base in a very efficient manner. The possibilities are endless!
Anthropic’s prompt caching is a powerful tool for building AI applications that can process and chat about large datasets. Give it a try and see how it can enhance your projects!
Ready to Build?
Start scraping web data for your AI apps today.
No credit card needed.
About the Author
Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.
More articles by Eric Ciarla
Cloudflare Error 1015: How to solve it?
Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request limit set by the website owner.
Build an agent that checks for website contradictions
Using Firecrawl and Claude to scrape your website's data and look for contradictions.
Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses
A guide to leveraging Predicted Outputs to speed up LLM tasks with GPT-4o models.
How to easily install requests with pip and python
A tutorial on installing the requests library in Python using various methods, with usage examples and troubleshooting tips
How to quickly install BeautifulSoup with Python
A guide on installing the BeautifulSoup library in Python using various methods, with usage examples and troubleshooting tips
How to Use OpenAI's o1 Reasoning Models in Your Applications
Learn how to harness OpenAI's latest o1 series models for complex reasoning tasks in your apps.
Introducing Fire Engine for Firecrawl
The most scalable, reliable, and fast way to get web data for Firecrawl.
Firecrawl July 2024 Updates
Discover the latest features, integrations, and improvements in Firecrawl for July 2024.