October 28, 2024

•

Eric Ciarla imageEric Ciarla

Launch Week II - Day 1: Introducing the Batch Scrape Endpoint

Launch Week II - Day 1: Introducing the Batch Scrape Endpoint image

Welcome to Day 1 of Firecrawl’s second Launch Week! We’re kicking things off with the introduction of our latest feature: the Batch Scrape Endpoint.

Say Hello to the Batch Scrape Endpoint

The Batch Scrape endpoint is designed to help you scrape multiple URLs at once, streamlining your web scraping tasks and saving you valuable time. Whether you’re dealing with a small list of pages or hundreds of URLs, this new endpoint makes bulk data retrieval more efficient than ever.

How It Works

Similar to our existing /crawl endpoint, the Batch Scrape endpoint allows you to submit a job that processes multiple URLs in one go. You can choose between synchronous and asynchronous methods:

  • Synchronous Method: Waits for the batch scrape job to complete and returns the results immediately.
  • Asynchronous Method: Returns a job ID right away, allowing you to check the job status and retrieve results when it’s convenient for you.

Getting Started with Batch Scrape

Using the Batch Scrape endpoint is straightforward. Here’s how you can get started with a simple cURL command:

curl -X POST https://api.firecrawl.dev/v1/batch/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"],
      "formats": ["markdown", "html"]
    }'

Understanding the Response

If you’re using the synchronous method, you’ll receive the results directly:

{
  "status": "completed",
  "total": 2,
  "completed": 2,
  "creditsUsed": 2,
  "expiresAt": "2024-10-21T00:00:00.000Z",
  "data": [
    {
      "markdown": "...",
      "html": "...",
      "metadata": {
        "title": "Firecrawl Documentation",
        "language": "en",
        "sourceURL": "https://docs.firecrawl.dev",
        "description": "Official documentation for Firecrawl.",
        "statusCode": 200
      }
    },
    {
      "markdown": "...",
      "html": "...",
      "metadata": {
        "title": "Firecrawl SDK Overview",
        "language": "en",
        "sourceURL": "https://docs.firecrawl.dev/sdks/overview",
        "description": "Overview of Firecrawl SDKs.",
        "statusCode": 200
      }
    }
  ]
}

If you opt for the asynchronous method, you’ll get a job ID to check the status later:

{
  "success": true,
  "id": "abc-123-def-456",
  "url": "https://api.firecrawl.dev/v1/batch/scrape/abc-123-def-456"
}

To check the job status and retrieve results, use the job ID:

curl -X GET https://api.firecrawl.dev/v1/batch/scrape/abc-123-def-456 \
    -H 'Authorization: Bearer YOUR_API_KEY'

Why Use Batch Scrape?

  • Efficiency: Process multiple URLs in a single request, reducing network overhead.
  • Flexibility: Choose between synchronous and asynchronous methods based on your application’s needs.
  • Customization: Specify output formats like Markdown or HTML to suit your data processing workflows.

What’s Next?

We’re just getting started with Launch Week II! The Batch Scrape endpoint is the first of several new features we’re unveiling this week to enhance your web scraping capabilities.

We’d love to hear how you plan to use the Batch Scrape endpoint in your projects. Your feedback helps us improve and tailor our services to better meet your needs.

Happy scraping, and stay tuned for Day 2 of Launch Week II tomorrow!

Ready to Build?

Start scraping web data for your AI apps today.
No credit card needed.

About the Author

Eric Ciarla image
Eric Ciarla@ericciarla

Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.

More articles by Eric Ciarla