ChangelogNew
We’ve significantly enhanced our data extraction capabilities with several key updates:
- Extract now returns a lot more data
- Improved infrastructure reliability
- Migrated from Cheerio to a high-performance Rust-based parser for faster and more memory-efficient parsing
- Enhanced crawl cancellation functionality for better control over running jobs
We have updated the
/extract
endpoint to now be asynchronous. When you make a request to/extract
, it will return an ID that you can use to check the status of your extract job. If you are using our SDKs, there are no changes required to your code, but please make sure to update the SDKs to the latest versions as soon as possible.For those using the API directly, we have made it backwards compatible. However, you have 10 days to update your implementation to the new asynchronous model.
For more details about the parameters, refer to the docs sent to you.
The search endpoint combines web search with Firecrawl’s scraping capabilities to return full page content for any query.
Include
scrapeOptions
withformats: ["markdown"]
to get complete markdown content for each search result otherwise it defaults to getting SERP results (url, title, description).More info here: v1/search docs
- Fixed LLM not following the schema in the python SDK for
/extract
- Fixed schema json not being able to be sent to the
/extract
endpoint through the Node SDK - Prompt is now optional for the
/extract
endpoint - Our fork of MinerU is now default for PDF Parsing
- Fixed LLM not following the schema in the python SDK for
Feature Enhancements
- New Features:
- Geolocation, mobile scraping, 4x faster parsing, better webhooks,
- Credit packs, auto-recharges and batch scraping support.
- Iframe support and query parameter differentiation for URLs.
- Similar URL deduplication.
- Enhanced map ranking and sitemap fetching.
Performance Improvements
- Faster crawl status filtering and improved map ranking algorithm.
- Optimized Kubernetes setup and simplified build processes.
- Sitemap discoverability and performance improved
Bug Fixes
- Resolved issues:
- Badly formatted JSON, scrolling actions, and encoding errors.
- Crawl limits, relative URLs, and missing error handlers.
- Fixed self-hosted crawling inconsistencies and schema errors.
SDK Updates
- Added dynamic WebSocket imports with fallback support.
- Optional API keys for self-hosted instances.
- Improved error handling across SDKs.
Documentation Updates
- Improved API docs and examples.
- Updated self-hosting URLs and added Kubernetes optimizations.
- Added articles: mastering
/scrape
and/crawl
.
Miscellaneous
- Added new Firecrawl examples
- Enhanced metadata handling for webhooks and improved sitemap fetching.
- Updated blocklist and streamlined error messages.
- New Features:
You can now scrape multiple URLs simultaneously with our new Batch Scrape endpoint.
- Read more about the Batch Scrape endpoint here.
- Python SDK (1.4.x) and Node SDK (1.7.x) updated with batch scrape support.
- Added crawl cancellation support for the Python SDK (1.3.x) and Node SDK (1.6.x)
- OpenAI Voice + Firecrawl example added to the repo
- CRM lead enrichment example added to the repo
- Improved our Docker images
- Limit and timeout fixes for the self hosted playwright scraper
- Improved speed of all scrapes
- Fixed 500 errors that would happen often in some crawled websites and when servers were at capacity
- Fixed an issue where v1 crawl status wouldn’t properly return pages over 10mb
- Fixed an issue where
screenshot
would return undefined - Push improvements that reduce speed times when a scraper fails
Interact with pages before extracting data, unlocking more data from every site!
Firecrawl now allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
- Version 1.5.x of the Node SDK now supports type-safe Actions.
- Actions are now available in the REST API and Python SDK (no version bumps required!).
Here is a python example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot.
from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="fc-YOUR_API_KEY") # Scrape a website: scrape_result = app.scrape_url('firecrawl.dev', params={ 'formats': ['markdown', 'html'], 'actions': [ {"type": "wait", "milliseconds": 2000}, {"type": "click", "selector": "textarea[title=\"Search\"]"}, {"type": "wait", "milliseconds": 2000}, {"type": "write", "text": "firecrawl"}, {"type": "wait", "milliseconds": 2000}, {"type": "press", "key": "ENTER"}, {"type": "wait", "milliseconds": 3000}, {"type": "click", "selector": "h3"}, {"type": "wait", "milliseconds": 3000}, {"type": "screenshot"} ] } ) print(scrape_result)
For more examples, check out our API Reference.
- E2E Type Safety for LLM Extract in Node SDK version 1.5.x.
- 10x cheaper in the cloud version. From 50 to 5 credits per extract.
- Improved speed and reliability.
- Rust SDK v1 is finally here! Check it out here.
- Map smart results limits increased from 100 to 1000.
- Scrape speed improved by 200ms-600ms depending on the website.
- For now on, for every new release, we will be creating a changelog entry here.
- Lots of improvements pushed to the infra and API. For all Mid-September changes, refer to the commits here.
- Output Formats for /scrape: Choose what formats you want your output in.
- New /map endpoint: Get most of the URLs of a webpage.
- Developer friendly API for /crawl/id status.
- 2x Rate Limits for all plans.
- Go SDK and Rust SDK.
- Teams support.
- API Key Management in the dashboard.
- onlyMainContent is now default to true.
- /crawl webhooks and websocket support.
Learn more about it here.
Start using v1 right away at https://firecrawl.dev