Build an agent that checks for website contradictions
In this quick tutorial you will learn how to use Firecrawl and Claude to scrape your website’s data and look for contradictions and inconsistencies in a few lines of code. When you are shipping fast, data is bound to get stale, with Firecrawl and LLMs you can make sure your public web data is always consistent! We will be using Opus’s huge 200k context window and Firecrawl’s parellization, making this process accurate and fast.
Setup
Install our python dependencies, including anthropic and firecrawl-py.
pip install firecrawl-py anthropic
Getting your Claude and Firecrawl API Keys
To use Claude Opus and Firecrawl, you will need to get your API keys. You can get your Anthropic API key from here and your Firecrawl API key from here.
Load website with Firecrawl
To be able to get all the data from our website page put it into an easy to read format for the LLM, we will use Firecrawl. It handles by-passing JS-blocked websites, extracting the main content, and outputting in a LLM-readable format for increased accuracy.
Here is how we will scrape a website url using Firecrawl-py
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="YOUR-KEY")
crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*','usecases/*']}})
print(crawl_result)
With all of the web data we want scraped and in a clean format, we can move onto the next step.
Combination and Generation
Now that we have the website data, let’s pair up every page and run every combination through Opus for analysis.
from itertools import combinations
page_combinations = []
for first_page, second_page in combinations(crawl_result, 2):
combined_string = "First Page:\n" + first_page['markdown'] + "\n\nSecond Page:\n" + second_page['markdown']
page_combinations.append(combined_string)
import anthropic
client = anthropic.Anthropic(
# defaults to os.environ.get("ANTHROPIC_API_KEY")
api_key="YOUR-KEY",
)
final_output = []
for page_combination in page_combinations:
prompt = "Here are two pages from a companies website, your job is to find any contradictions or differences in opinion between the two pages, this could be caused by outdated information or other. If you find any contradictions, list them out and provide a brief explanation of why they are contradictory or differing. Make sure the explanation is specific and concise. It is okay if you don't find any contradictions, just say 'No contradictions found' and nothing else. Here are the pages: " + "\n\n".join(page_combination)
message = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
temperature=0.0,
system="You are an assistant that helps find contradictions or differences in opinion between pages in a company website and knowledge base. This could be caused by outdated information in the knowledge base.",
messages=[
{"role": "user", "content": prompt}
]
)
final_output.append(message.content)
That’s about it!
You have now built an agent that looks at your website and spots any inconsistencies it might have.
If you have any questions or need help, feel free to reach out to us at Firecrawl.
About the Author
Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.
More articles by Eric Ciarla
How to Create an llms.txt File for Any Website
Learn how to generate an llms.txt file for any website using the llms.txt Generator and Firecrawl.
Cloudflare Error 1015: How to solve it?
Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request limit set by the website owner.
Build an agent that checks for website contradictions
Using Firecrawl and Claude to scrape your website's data and look for contradictions.
Why Companies Need a Data Strategy for Generative AI
Learn why a well-defined data strategy is essential for building robust, production-ready generative AI systems, and discover practical steps for curation, maintenance, and integration.
Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses
A guide to leveraging Predicted Outputs to speed up LLM tasks with GPT-4o models.
How to easily install requests with pip and python
A tutorial on installing the requests library in Python using various methods, with usage examples and troubleshooting tips
How to quickly install BeautifulSoup with Python
A guide on installing the BeautifulSoup library in Python using various methods, with usage examples and troubleshooting tips
How to Use OpenAI's o1 Reasoning Models in Your Applications
Learn how to harness OpenAI's latest o1 series models for complex reasoning tasks in your apps.