Getting Started with Grok-2: Setup and Web Crawler Example
Grok-2, the latest language model from x.ai, brings advanced language understanding capabilities to developers, enabling the creation of intelligent applications with ease. In this tutorial, we’ll walk you through setting up Grok-2, obtaining an API key, and then building a web crawler using Firecrawl to extract structured data from any website.
Part 1: Setting Up Grok-2
Before diving into coding, we need to set up Grok-2 and get an API key.
Step 1: Sign Up for an x.ai Account
To access the Grok-2 API, you’ll need an x.ai account.
- Visit the Sign-Up Page: Go to x.ai Sign-Up.
- Register: Fill out the registration form with your email and create a password.
- Verify Your Email: Check your inbox for a verification email from x.ai and click the link to verify your account.
Step 2: Fund Your Account
To use the Grok-2 API, your account must have funds.
- Access the Cloud Console: After logging in, you’ll be directed to the x.ai Cloud Console.
- Navigate to Billing: Click on the Billing tab in the sidebar.
- Add Payment Method: Provide your payment details to add credits to your account.
Step 3: Obtain Your API Key
With your account funded, you can now generate an API key.
- Go to API Keys: Click on the API Keys tab in the Cloud Console.
- Create a New API Key: Click on Create New API Key and give it a descriptive name.
- Copy Your API Key: Make sure to copy your API key now, as it won’t be displayed again for security reasons.
Note: Keep your API key secure and do not share it publicly.
Part 2: Building a Web Crawler with Grok-2 and Firecrawl
Now that Grok-2 is set up, let’s build a web crawler to extract structured data from websites.
Prerequisites
- Python 3.6+
- Firecrawl Python Library
- Requests Library
- dotenv Library
Install the required packages:
pip install firecrawl-py requests python-dotenv
Step 1: Set Up Environment Variables
Create a .env
file in your project directory to store your API keys securely.
GROK_API_KEY=your_grok_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
Replace your_grok_api_key
and your_firecrawl_api_key
with your actual API keys.
Step 2: Initialize Your Script
Create a new Python script (e.g., web_crawler.py
) and start by importing the necessary libraries and loading your environment variables.
import os
import json
import requests
from dotenv import load_dotenv
from firecrawl import FirecrawlApp
# Load environment variables from .env file
load_dotenv()
# Retrieve API keys
grok_api_key = os.getenv("GROK_API_KEY")
firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
# Initialize FirecrawlApp
app = FirecrawlApp(api_key=firecrawl_api_key)
Step 3: Define the Grok-2 API Interaction Function
We need a function to interact with the Grok-2 API.
def grok_completion(prompt):
url = "https://api.x.ai/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {grok_api_key}"
}
data = {
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
"model": "grok-2",
"stream": False,
"temperature": 0
}
response = requests.post(url, headers=headers, json=data)
response_data = response.json()
return response_data['choices'][0]['message']['content']
Step 4: Identify Relevant Pages on the Website
Define a function to find pages related to our objective.
def find_relevant_pages(objective, url):
prompt = f"Based on the objective '{objective}', suggest a 1-2 word search term to locate relevant information on the website."
search_term = grok_completion(prompt).strip()
map_result = app.map_url(url, params={"search": search_term})
return map_result.get("links", [])
Step 5: Extract Data from the Pages
Create a function to scrape the pages and extract the required data.
def extract_data_from_pages(links, objective):
for link in links[:3]: # Limit to top 3 links
scrape_result = app.scrape_url(link, params={'formats': ['markdown']})
content = scrape_result.get('markdown', '')
prompt = f"""Given the following content, extract the information related to the objective '{objective}' in JSON format. If not found, reply 'Objective not met'.
Content: {content}
Remember:
- Only return JSON if the objective is met.
- Do not include any extra text.
"""
result = grok_completion(prompt).strip()
if result != "Objective not met":
try:
data = json.loads(result)
return data
except json.JSONDecodeError:
continue # Try the next link if JSON parsing fails
return None
Step 6: Implement the Main Function
Combine everything into a main function.
def main():
url = input("Enter the website URL to crawl: ")
objective = input("Enter your data extraction objective: ")
print("\nFinding relevant pages...")
links = find_relevant_pages(objective, url)
if not links:
print("No relevant pages found.")
return
print("Extracting data from pages...")
data = extract_data_from_pages(links, objective)
if data:
print("\nData extracted successfully:")
print(json.dumps(data, indent=2))
else:
print("Could not find data matching the objective.")
if __name__ == "__main__":
main()
Step 7: Run the Script
Save your script and run it from the command line.
python web_crawler.py
Example Interaction:
Enter the website URL to crawl: https://example.com
Enter your data extraction objective: Retrieve the list of services offered.
Finding relevant pages...
Extracting data from pages...
Data extracted successfully:
{
"services": [
"Web Development",
"SEO Optimization",
"Digital Marketing"
]
}
Conclusion
In this tutorial, we’ve successfully set up Grok-2, obtained an API key, and built a web crawler using Firecrawl. This powerful combination allows you to automate the process of extracting structured data from websites, making it a valuable tool for various applications.
Next Steps
- Explore More Features: Check out the Grok-2 and Firecrawl documentation to learn about additional functionalities.
- Enhance Error Handling: Improve the script with better error handling and logging.
- Customize Data Extraction: Modify the extraction logic to suit different objectives or data types.
References
On this page
Part 1: Setting Up Grok-2
Step 1: Sign Up for an x.ai Account
Step 2: Fund Your Account
Step 3: Obtain Your API Key
Part 2: Building a Web Crawler with Grok-2 and Firecrawl
Prerequisites
Step 1: Set Up Environment Variables
Step 2: Initialize Your Script
Step 3: Define the Grok-2 API Interaction Function
Step 4: Identify Relevant Pages on the Website
Step 5: Extract Data from the Pages
Step 6: Implement the Main Function
Step 7: Run the Script
Conclusion
Next Steps
References
About the Author
Nicolas Camara is the Chief Technology Officer (CTO) at Firecrawl. He previously built and scaled Mendable, one of the pioneering "chat with your documents" apps, which had major Fortune 500 customers like Snapchat, Coinbase, and MongoDB. Prior to that, Nicolas built SideGuide, the first code-learning tool inside VS Code, and grew a community of 50,000 users. Nicolas studied Computer Science and has over 10 years of experience in building software.
More articles by Nicolas Camara
Using OpenAI's Realtime API and Firecrawl to Talk with Any Website
Build a real-time conversational agent that interacts with any website using OpenAI's Realtime API and Firecrawl.
Extract website data using LLMs
Learn how to use Firecrawl and Groq to extract structured data from a web page in a few lines of code.
Getting Started with Grok-2: Setup and Web Crawler Example
A detailed guide on setting up Grok-2 and building a web crawler using Firecrawl.
Launch Week I / Day 6: LLM Extract (v1)
Extract structured data from your web pages using the extract format in /scrape.
Launch Week I / Day 7: Crawl Webhooks (v1)
New /crawl webhook support. Send notifications to your apps during a crawl.
OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website
A guide to building a multi-agent system using OpenAI Swarm and Firecrawl for AI-driven marketing strategies
Build a 'Chat with website' using Groq Llama 3
Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.
Scrape and Analyze Airbnb Data with Firecrawl and E2B
Learn how to scrape and analyze Airbnb data using Firecrawl and E2B in a few lines of code.