Launch Week II - Day 1: Introducing the Batch Scrape Endpoint
Welcome to Day 1 of Firecrawl’s second Launch Week! We’re kicking things off with the introduction of our latest feature: the Batch Scrape Endpoint.
Say Hello to the Batch Scrape Endpoint
The Batch Scrape endpoint is designed to help you scrape multiple URLs at once, streamlining your web scraping tasks and saving you valuable time. Whether you’re dealing with a small list of pages or hundreds of URLs, this new endpoint makes bulk data retrieval more efficient than ever.
How It Works
Similar to our existing /crawl
endpoint, the Batch Scrape endpoint allows you to submit a job that processes multiple URLs in one go. You can choose between synchronous and asynchronous methods:
- Synchronous Method: Waits for the batch scrape job to complete and returns the results immediately.
- Asynchronous Method: Returns a job ID right away, allowing you to check the job status and retrieve results when it’s convenient for you.
Getting Started with Batch Scrape
Using the Batch Scrape endpoint is straightforward. Here’s how you can get started with a simple cURL command:
curl -X POST https://api.firecrawl.dev/v1/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"],
"formats": ["markdown", "html"]
}'
Understanding the Response
If you’re using the synchronous method, you’ll receive the results directly:
{
"status": "completed",
"total": 2,
"completed": 2,
"creditsUsed": 2,
"expiresAt": "2024-10-21T00:00:00.000Z",
"data": [
{
"markdown": "...",
"html": "...",
"metadata": {
"title": "Firecrawl Documentation",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev",
"description": "Official documentation for Firecrawl.",
"statusCode": 200
}
},
{
"markdown": "...",
"html": "...",
"metadata": {
"title": "Firecrawl SDK Overview",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/sdks/overview",
"description": "Overview of Firecrawl SDKs.",
"statusCode": 200
}
}
]
}
If you opt for the asynchronous method, you’ll get a job ID to check the status later:
{
"success": true,
"id": "abc-123-def-456",
"url": "https://api.firecrawl.dev/v1/batch/scrape/abc-123-def-456"
}
To check the job status and retrieve results, use the job ID:
curl -X GET https://api.firecrawl.dev/v1/batch/scrape/abc-123-def-456 \
-H 'Authorization: Bearer YOUR_API_KEY'
Why Use Batch Scrape?
- Efficiency: Process multiple URLs in a single request, reducing network overhead.
- Flexibility: Choose between synchronous and asynchronous methods based on your application’s needs.
- Customization: Specify output formats like Markdown or HTML to suit your data processing workflows.
What’s Next?
We’re just getting started with Launch Week II! The Batch Scrape endpoint is the first of several new features we’re unveiling this week to enhance your web scraping capabilities.
We’d love to hear how you plan to use the Batch Scrape endpoint in your projects. Your feedback helps us improve and tailor our services to better meet your needs.
Happy scraping, and stay tuned for Day 2 of Launch Week II tomorrow!
About the Author
Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.
More articles by Eric Ciarla
How to Create an llms.txt File for Any Website
Learn how to generate an llms.txt file for any website using the llms.txt Generator and Firecrawl.
Cloudflare Error 1015: How to solve it?
Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request limit set by the website owner.
Build an agent that checks for website contradictions
Using Firecrawl and Claude to scrape your website's data and look for contradictions.
Why Companies Need a Data Strategy for Generative AI
Learn why a well-defined data strategy is essential for building robust, production-ready generative AI systems, and discover practical steps for curation, maintenance, and integration.
Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses
A guide to leveraging Predicted Outputs to speed up LLM tasks with GPT-4o models.
How to easily install requests with pip and python
A tutorial on installing the requests library in Python using various methods, with usage examples and troubleshooting tips
How to quickly install BeautifulSoup with Python
A guide on installing the BeautifulSoup library in Python using various methods, with usage examples and troubleshooting tips
How to Use OpenAI's o1 Reasoning Models in Your Applications
Learn how to harness OpenAI's latest o1 series models for complex reasoning tasks in your apps.