Building a Local Deep Research Application with Firecrawl and Streamlit
Introduction
Conducting thorough web research is a time-consuming process that often involves jumping between multiple websites, cross-referencing information, and trying to synthesize findings into a coherent understanding. Traditional search engines return lists of links rather than comprehensive answers, leaving users to do the heavy lifting of extracting and connecting information. As demonstrated in the video above, our open-source deep research application aims to solve this problem by automating the research process.
The application uses Firecrawl’s deep research endpoint to actively explore the web, following relevant links and gathering information across multiple sources. Unlike standard search tools, it doesn’t just find information—it synthesizes it into comprehensive, well-structured answers with proper source citations. Built with Streamlit and Python, this tool provides a more affordable alternative to premium AI research services like OpenAI’s Advanced Data Analysis ($200/month subscription), while offering similar functionality for complex research tasks. In this article, we’ll walk through how to build this application step by step, from integrating the Firecrawl API to creating an intuitive user interface that displays real-time research progress.
Understanding the Deep Research Endpoint
Traditional web research presents numerous challenges as highlighted earlier. To address these limitations, Firecrawl introduces its deep research endpoint - a powerful solution specifically designed for comprehensive information gathering and synthesis. This AI-powered research API fundamentally transforms how applications gather and synthesize web information. Unlike traditional search engines that simply return a list of links, this endpoint actively explores the web to build comprehensive answers to complex queries. The system works through a multi-step process: first analyzing your query to identify key research areas, then iteratively searching and exploring relevant web content, synthesizing information from multiple sources, and finally providing structured findings with proper source attribution.
What makes this endpoint particularly powerful is its ability to autonomously navigate between sources, following contextual threads to build a complete picture of a topic. The research process provides not just final answers but also detailed activity logs showing how the information was gathered, creating transparency that’s missing from typical AI responses. All research is returned with source citations, allowing users to verify information and explore topics further. The system is configurable through parameters that control research depth, time limits, and the maximum number of URLs analyzed. Implementing the deep research endpoint in your application is straightforward with the Firecrawl Python library. Here’s a simple example that demonstrates the core functionality:
from firecrawl import FirecrawlApp
# Initialize the client
firecrawl = FirecrawlApp(api_key="your-api-key")
# Define research parameters
params = {
"maxDepth": 3, # Number of research iterations
"timeLimit": 180, # Time limit in seconds
"maxUrls": 10 # Maximum URLs to analyze
}
# Set up a callback for real-time updates
def on_activity(activity):
print(f"[{activity['type']}] {activity['message']}")
# Run deep research
results = firecrawl.deep_research(
query="What are the latest developments in programming memes?",
params=params,
on_activity=on_activity
)
# Access research findings
print(f"Final Analysis: {results['data']['finalAnalysis']}")
print(f"Sources: {len(results['data']['sources'])} references")
[search] Generating deeper search queries for "What are the latest developments in programming memes?"
[search] Starting 3 parallel searches for "What are the latest developments in programming memes?"
[search] Searching for "popular programming memes and their origins" - Goal: To identify and analyze the most popular programming memes and trace their origins and evolution.
[search] Found 15 new relevant results across 3 parallel queries
[analyze] Analyzing findings and planning next steps
[analyze] Analyzed findings
[synthesis] Preparing final analysis
[synthesis] Research completed
Final Analysis: # Research Report: Latest Developments in Programming Memes
## Introduction
Programming memes have become a significant part of the developer culture, serving as a humorous outlet for the challenges and quirks of coding. As of 2025, the landscape of programming memes has evolved, reflecting changes in technology, programming languages, and the developer community's sentiments. This report explores the latest developments in programming memes, analyzing current trends, popular themes, and their impact on the community...
In this example, we initialize the Firecrawl client with an API key, configure research parameters to control the scope and duration of the research, and set up a callback function to receive real-time updates as the research progresses. The deep_research
method initiates the research process with our quantum computing query, and the results contain both the synthesized analysis and a list of sources that were consulted. This foundation of research capabilities will be the core of our Streamlit application, which we’ll build throughout this article. Next, we’ll look at the application structure that makes this application possible.
The Application Structure
deep-research-endpoint/
├── src/
│ ├── app.py # Main Streamlit application
│ ├── firecrawl_client.py # Firecrawl API client wrapper
│ ├── ui.py # UI components
│ └── utils.py # Utility functions
├── requirements.txt # Dependencies
├── run.py # Simple launcher script
└── README.md # Documentation
The deep research application follows a logical organization pattern with a modular file structure. The main directory contains a run.py
script that launches the application, requirements.txt
that lists all necessary Python dependencies, and a README.md
file with installation and usage instructions. Inside the src
directory, four Python files handle different aspects of the application: app.py
serves as the main entry point and orchestrates the research process, firecrawl_client.py
provides a dedicated wrapper for the Firecrawl API’s deep research endpoint, ui.py
contains components for the user interface, and utils.py
holds utility functions for data formatting and validation.
This modular approach separates concerns and makes the code more maintainable. The app.py
file initializes the Streamlit application and manages the main workflow, including processing user inputs and triggering research requests. The firecrawl_client.py
module encapsulates all interaction with the Firecrawl API, isolating the complexity of making requests and handling responses. The ui.py
file contains functions that generate the visual elements of the application such as the chat interface, activity display, and results formatting. Finally, utils.py
provides helper functions that don’t fit into the other categories. This organization pattern is common in medium-sized applications as it balances simplicity with the benefits of code separation.
Setting Up the Environment
Before we can build our deep research application, we need to set up a proper development environment with all necessary dependencies. The application requires Python 3.7 or later and access to the Firecrawl API. To get started, first clone the GitHub repository containing the example code:
git clone https://github.com/mendableai/firecrawl-app-examples.git
cd firecrawl-app-examples/deep-research-endpoint
Once you have the code, create a virtual environment to keep the dependencies isolated from your system Python installation. This step prevents potential conflicts between different Python projects on your machine:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Next, install the required dependencies listed in the requirements.txt
file:
pip install -r requirements.txt
The requirements file includes three main dependencies: streamlit
for the web interface, requests
for handling HTTP requests, and firecrawl-py
for interacting with the Firecrawl API. The most important configuration step is obtaining a Firecrawl API key, which you’ll need to use the deep research endpoint. You can get an API key by signing up at the Firecrawl website (firecrawl.dev). Once you have your API key, you’re ready to run the application. We’ll use this key in the application’s sidebar later, so there’s no need to set up environment variables or configuration files for this example.
Implementing the Firecrawl Client Wrapper
The FirecrawlClient
wrapper serves as the foundation of our application, acting as the bridge between our Streamlit interface and Firecrawl’s powerful research capabilities. This component isolates all API-related code in one place, making the rest of our application cleaner and more maintainable. By centralizing API interactions, we create a single point of contact that can be easily updated if the Firecrawl API changes in the future, without requiring modifications to other parts of our code.
from typing import Dict, Any, Callable, Optional
class FirecrawlClient:
"""Client for interacting with the Firecrawl API."""
def __init__(self, api_key: str):
"""Initialize the Firecrawl client with an API key.
Args:
api_key (str): The Firecrawl API key
"""
from firecrawl import FirecrawlApp
self.client = FirecrawlApp(api_key=api_key)
def deep_research(
self,
query: str,
max_depth: int = 3,
timeout_limit: int = 120,
max_urls: int = 20,
on_activity: Optional[Callable[[Dict[str, Any]], None]] = None,
) -> Dict[str, Any]:
The deep_research
method in our client wrapper will be called whenever a user submits a query through the chat interface. This method transforms the user’s question and configuration settings into an API call, then processes the response into a consistent format for display. The ability to pass an activity callback function is particularly important, as it enables the real-time progress updates that make our application interactive and engaging for users.
Later in our application workflow, we’ll integrate this client with Streamlit’s UI components. When a user submits a research query, the app.py
file will create an instance of FirecrawlClient
, call its deep_research
method with the appropriate parameters, and use the results to update the chat history. The client’s role in formatting research results and extracting sources will be especially important when we implement the results display, as it provides clean, structured data that can be presented to users in a readable format.
Building the Streamlit Interface
Streamlit provides a straightforward way to build web interfaces in Python. Our application’s UI consists of three main components: a sidebar for settings, a chat interface for submitting queries and viewing responses, and an activity display for real-time updates. In the ui.py
file, we implement functions for each of these components. The setup_sidebar
function creates input fields for the API key and research parameters using Streamlit’s form elements like st.text_input
and st.slider
. For example, we create a slider for controlling research depth with st.slider("Maximum Depth", min_value=1, max_value=10, value=3)
, which lets users adjust how deeply the research process explores a topic.
# src/ui.py
import streamlit as st
from typing import Dict, Any, Tuple
import time
import random
def setup_sidebar() -> Dict[str, Any]:
"""Set up the sidebar with input fields.
Returns:
Dict[str, Any]: Dictionary containing user inputs from the sidebar
"""
st.sidebar.title("Configuration")
api_key = st.sidebar.text_input(
"Enter your Firecrawl API Key",
type="password",
help="Your Firecrawl API key for authentication. You can obtain this from the Firecrawl dashboard.",
)
st.sidebar.markdown("### Research Parameters")
The central chat interface relies on Streamlit’s chat elements. We use st.chat_input
to create a text field where users enter research queries, and st.chat_message
to display messages from both the user and assistant. To maintain conversation history across interactions, we implement session state management with st.session_state
. This code pattern appears in the init_session_state
function in utils.py
:
def init_session_state():
"""Initialize session state variables if they don't exist."""
if "messages" not in st.session_state:
st.session_state.messages = []
if "processing" not in st.session_state:
st.session_state.processing = False
if "activity_container" not in st.session_state:
st.session_state.activity_container = None
if "current_sources" not in st.session_state:
st.session_state.current_sources = ""
The session state stores messages, research sources, and processing status, allowing the application to maintain context between user interactions without requiring a database.
The real-time activity display is one of the most distinctive features of our interface. We create a dedicated container for activity updates using st.container()
, which serves as a dynamic area where research progress appears. When an activity update arrives from the Firecrawl API, the show_activity_update
function formats it according to its type and displays it in this container. For instance, search activities appear with a magnifying glass emoji, while analysis activities use a brain emoji. This visual differentiation helps users understand the current stage of the research process at a glance.
# src/ui.py
def show_activity_update(activity_data: Dict[str, Any]):
"""Display an activity update based on the activity data.
Args:
activity_data (Dict[str, Any]): Activity data from the API
"""
# Handle activity displays based on Firecrawl API structure
activity_type = activity_data.get("type", "")
message = activity_data.get("message", "")
# Map activity types to appropriate icons
icon_map = {
"search": "🔍",
"extract": "đź“„",
"analyze": "🧠",
"reasoning": "⚖️",
"synthesis": "✨",
"thought": "đź’",
# Default icon for unknown types
"default": "🔄",
}
# Get the appropriate icon
icon = icon_map.get(activity_type, icon_map["default"])
if message:
st.markdown(f"{icon} **{message}**")
The final element of our interface is the results display, which shows both the synthesized research and source citations. We implement an expandable section for sources with st.expander("View Sources", expanded=False)
, allowing users to see the complete list of references without cluttering the main view.
The interface also simulates typewriter-style text appearance using the simulate_streaming_response
function, which creates the impression of the assistant actively typing out the response. This effect is necessary as the Firecrawl API returns the final analysis as a single block of text, which is not ideal for our purposes.
Connecting the Components: The Application Core
The heart of our application is the app.py
file, which connects all components into a cohesive system. This file contains the main()
function that runs when the application starts and the perform_research()
function that handles the research workflow. When a user submits a query, the application first validates the inputs using functions from utils.py
, ensuring that the API key is provided and parameters are within acceptable ranges. If validation passes, the application creates an instance of FirecrawlClient
and prepares a container for displaying activity updates. The workflow follows a clear sequence: receive user query, initialize research, display real-time updates, and present results.
def main():
"""Main application entry point."""
st.set_page_config(
page_title="Deep Research Chatbot", page_icon="🔍", layout="wide"
)
# Initialize session state
init_session_state()
# Set up sidebar and get configuration
config = setup_sidebar()
# Set up main UI
is_query_submitted, query = setup_main_ui()
# Display chat history
display_chat_history()
# Handle query submission
if is_query_submitted and not st.session_state.processing:
# Validate inputs
errors = validate_inputs(config)
if errors:
for error in errors:
show_error(error)
else:
perform_research(query, config)
Real-time updates are implemented through the activity callback system. In the perform_research()
function, we define handle_activity_update()
which is passed to the Firecrawl client’s deep_research()
method. Each time the Firecrawl API reports a new activity (such as starting a search or analyzing results), this function receives the activity data and displays it in the dedicated container.
This single line connects the user’s query and settings to the Firecrawl API and establishes the update pipeline.
When research completes, the application processes the results using the format_research_results()
function from utils.py
. This function takes the raw API response and transforms it into a readable format with proper markdown formatting. It separates the main analysis from the sources, allowing the application to display the analysis directly in the chat and place sources in an expandable section. The code extracts these components using string operations. This approach allows users to focus on the research findings while still having access to the supporting references.
The application’s state management ties everything together, maintaining the conversation context between interactions. When research completes, the results are added to the message history stored in st.session_state.messages
. This allows users to submit multiple queries in a single session. The application also handles errors gracefully, catching exceptions during the research process and displaying appropriate error messages to users. After each research cycle completes or fails, the application resets the processing state flag (st.session_state.processing = False
), allowing users to submit new queries. This combination of state management, error handling, and process control creates a robust application that can handle complex research tasks while providing a smooth user experience.
Enhancing and Extending the Application
While our deep research application provides a solid foundation for AI-powered web research, several enhancements could further improve its functionality and user experience. The current implementation prioritizes simplicity and core functionality, but production applications often require additional features to address real-world needs. Adding authentication would allow the application to serve multiple users while tracking API usage per account. Persistent storage of research results would enable users to revisit previous findings without repeating the same queries, potentially using a database like SQLite for smaller deployments or PostgreSQL for larger ones.
Here are some key enhancements to consider:
- User authentication: Implement a login system to manage multiple users and track API usage
- Research history: Store previous queries and results for future reference
- Export functionality: Add options to export research results as PDF, Markdown, or Word documents
- Customizable research templates: Create predefined research configurations for common use cases
- Collaborative features: Enable sharing research sessions with team members
- Advanced visualization: Add charts and graphs to represent connections between sources
- Offline mode: Cache research results for viewing without internet connectivity
Conclusion
In this article, we’ve built a complete deep research application that transforms how users gather and synthesize information from the web. By leveraging Firecrawl’s deep research endpoint, we’ve created a tool that automatically explores multiple sources, follows contextual threads, and provides comprehensive answers with proper citations. The Streamlit interface makes the research process transparent and interactive, with real-time updates keeping users informed at every step. This open-source implementation demonstrates how AI-powered research capabilities can be made accessible to everyone, providing an alternative to expensive commercial solutions.
To explore more of what Firecrawl offers, visit the Firecrawl Blog for tutorials and use cases. The Mastering Firecrawl Scrape Endpoint guide provides in-depth information on extracting structured data from websites, which can complement your research application. For advanced text processing, check out the LLMs.txt documentation, which enables powerful text analysis capabilities. By combining these tools and approaches, you can build even more sophisticated applications that transform how we interact with and learn from web content.
About the Author

Bex is a Top 10 AI writer on Medium and a Kaggle Master with over 15k followers. He loves writing detailed guides, tutorials, and notebooks on complex data science and machine learning topics
More articles by Bex Tuychiev
Building an Automated Price Tracking Tool
Learn how to build an automated price tracker in Python that monitors e-commerce prices and sends alerts when prices drop.
Web Scraping Automation: How to Run Scrapers on a Schedule
Learn how to automate web scraping in Python using free scheduling tools to run scrapers reliably in 2025.
Automated Data Collection - A Comprehensive Guide
A comprehensive guide to building robust automated data collection systems using modern tools and best practices.
BeautifulSoup4 vs. Scrapy - A Comprehensive Comparison for Web Scraping in Python
A comprehensive comparison of BeautifulSoup4 and Scrapy to help you choose the right Python web scraping tool.
How to Build a Client Relationship Tree Visualization Tool in Python
Build an application that discovers and visualizes client relationships by scraping websites with Firecrawl and presenting the data in an interactive tree structure using Streamlit and PyVis.
How to Build an Automated Competitor Price Monitoring System with Python
Learn how to build an automated price monitoring system in Python to track and compare competitor prices across e-commerce sites.
Scraping Company Data and Funding Information in Bulk With Firecrawl and Claude
Learn how to scrape company and funding data from Crunchbase using Firecrawl and Claude.
How to Create Custom Instruction Datasets for LLM Fine-tuning
A comprehensive guide to creating instruction datasets for fine-tuning LLMs, including best practices and a practical code documentation example.