Introducing /extract - Get web data with a prompt

Jan 11, 2025

•

Bex Tuychiev imageBex Tuychiev

Building a Trend Detection System with AI in TypeScript: A Step-by-Step Guide

Building a Trend Detection System with AI in TypeScript: A Step-by-Step Guide image

Introduction

In this comprehensive guide, we’ll explore the development of a sophisticated social media trend detection system built with TypeScript and powered by AI. You’ll learn how to create a robust solution that monitors social platforms and news websites, analyzes emerging trends, and delivers them in real-time as Slack messages.

Before we dive into the technical details and implementation steps, watch a video preview of the project:

While the video demonstrates the project detecting AI-related trends from specific sources, you have the flexibility to customize it for monitoring any topics or themes from your preferred websites and Twitter accounts. If that sounds interesting, let’s set up your development environment for running the project locally.

Note: Before starting this project, ensure you have the following prerequisites installed:

  • Node.js (version 16 or higher)
  • npm (Node Package Manager)
  • A code editor like VS Code
  • Git for version control
  • A Slack workspace with admin privileges
  • X Developer Account (for X API access)
  • Basic knowledge of TypeScript and Node.js

Project Setup

We start by cloning the project’s GitHub repository, which is maintained by Eric Ciarla, co-founder of Firecrawl:

git clone https://github.com/ericciarla/trendFinder
cd trendFinder

Next, install dependencies:

npm install

Then, configure your .env file:

cp .env.example .env
# Edit .env with your configuration

if you open .env.example, you will see that our app depends on four core services:

  1. Slack Webhook - For sending notifications about detected trends
  2. X (Twitter) API - For monitoring tweets and engagement metrics
  3. Together AI - For analyzing content and detecting trends with LLMs
  4. Firecrawl API - For scraping and monitoring web content

To run the project locally, you will need to obtain the necessary URLs and API keys from these services. Below, you will see some instructions for setting up each required service and obtaining the necessary credentials.

Obtaining API Tokens

X (Twitter) Bearer Token

The X API is a crucial component of our trend detection system. It allows us to monitor prominent accounts in real-time, track engagement metrics, and identify emerging topics. The Bearer Token provides secure authentication for making API requests through your own X developer account. Here are the instructions to get your token:

  1. Go to Twitter Developer Portal
  2. Create a developer account if needed
  3. Create a new project and app (free plan accounts already have an app ready)
  4. Navigate to “Keys and Tokens”
  5. Generate/copy your Bearer Token (OAuth 2.0)
  6. Add to .env file

Firecrawl API Key

Firecrawl serves as our primary web content extraction engine, offering several key advantages for trend detection:

  1. AI-Powered Content Extraction: Uses natural language understanding instead of brittle HTML selectors, ensuring reliable trend detection even when websites change.
  2. Automated Content Discovery: Automatically processes entire website sections, ideal for monitoring news sites and blogs
  3. Multiple Output Formats: Supports structured data, markdown, and plain text formats for seamless integration with Together AI
  4. Built-in Rate Limiting: Handles request management automatically, ensuring stable monitoring

Since Firecrawl is a scraping engine, you will need an API key to connect to it through its TypeScript dependency:

  1. Visit Firecrawl
  2. Create an account
  3. Navigate to your dashboard
  4. Generate and copy your API key
  5. Add to .env file

Together AI Token

Together AI powers the intelligence layer of our trend detection system:

  1. Natural Language Processing: Analyzes scraped content to identify emerging trends and patterns
  2. Sentiment Analysis: Evaluates public sentiment and engagement around potential trends
  3. Content Summarization: Generates concise summaries of trends for Slack notifications

To get an API token, follow these steps:

  1. Visit Together AI
  2. Sign up for an account
  3. Navigate to API settings/dashboard
  4. Generate and copy your API key
  5. Add to .env file

Setting Up Slack Webhook

Finally, you will need a Slack webhook URL to receive real-time notifications about emerging trends. When our system runs, it scrapes provided list of sources (X accounts, websites), detects trends related to our specified topics, summarizes their contents and delivers them as a Slack message through the webhook.

To create a webhook for your account, follow these steps:

  1. Create a Slack Workspace (Log in if you already have one)

    • Visit slack.com
    • Click “Create a new workspace”
    • Follow the setup wizard to create your workspace
    • Verify your email address
  2. Create a Slack App

    • Go to api.slack.com/apps
    • Click “Create New App”
    • Choose “From scratch”
    • Name your app (e.g., “Trend Finder”)
    • Select your workspace
    • Click “Create App”
  3. Enable Incoming Webhooks

    • In your app’s settings, click “Incoming Webhooks”
    • Toggle “Activate Incoming Webhooks” to On
    • Click “Add New Webhook to Workspace”
  4. Configure the Webhook

    • Choose the channel where you want notifications to appear
    • Click “Allow”
    • You’ll see your new webhook URL in the list
    • Copy the Webhook URL (it starts with https://hooks.slack.com/services/)
  5. Add to Environment Variables

    • Open your .env file
    • Add your webhook URL:
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
  1. Test the Webhook (Optional)
    • You can test your webhook using curl:
curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello from Trend Finder!"}' YOUR_WEBHOOK_URL

First project run

Once the Node dependencies are installed and environment variables are configured, you can launch the app with a single command:

npm run start

The command starts the entire process and sends a Slack message to your workspace with found trends upon completion. By default, the project is configured to watch AI trends. Once we explore each component, you can change this default behavior.

System Architecture Overview

In this section, let’s break down the main components of our system.

Core Components

1. Entry Point (src/index.ts)

See the file on GitHub.

The application’s entry point is minimal and focused, setting up either a one-time execution or a scheduled cron job. It imports the main controller and can be configured to run on a schedule (currently commented out but set for 5 PM daily).

2. Cron Controller (src/controllers/cron.ts)

See the file on GitHub.

The controller orchestrates the entire workflow in a sequential process:

  1. Fetches source configurations
  2. Scrapes content from sources
  3. Generates an AI-analyzed draft
  4. Sends the results to Slack

3. Source Management (src/services/getCronSources.ts)

See the file on GitHub.

This service manages the content sources, supporting two types of inputs:

  • Websites (these will be scraped with Firecrawl)
  • Twitter/X accounts (scraped with the X Developer API)

The service verifies API keys and filters content sources to prevent unauthorized access attempts. The configuration file includes multiple AI news websites and one X account. While the file lists several prominent AI news X accounts, most are currently disabled in comments because the X Developer API free tier restricts scraping to one account every 15 minutes.

This is where you would add your own sources if you wish to monitor a trend other than AI.

4. Content Scraping (src/services/scrapeSources.ts)

See the file on GitHub.

A robust scraping service that:

  • Handles Twitter/X API integration for social media content
  • Uses Firecrawl for web page content extraction
  • Implements strong typing and structured extraction with Zod schemas
  • Normalizes data from different sources into a consistent format

It is in this file that the topic of interest is specified as “AI”. To change this, you need to update the lines 21 and 96.

5. Draft Generation (src/services/generateDraft.ts)

See the file on GitHub.

The AI analysis component that:

  • Uses Together AI’s Llama 3.1 model
  • Processes raw content through structured prompts
  • Implements JSON schema validation
  • Formats content into readable Slack messages

This script has several parts that make it tailored for watching AI trends. You would need to change those parts as well to choose a different topic.

6. Notification Service (src/services/sendDraft.ts)

A straightforward service that delivers the processed content to Slack via webhooks, with proper error handling and logging.

Infrastructure

The application is built with robust infrastructure and development tools to ensure reliability and maintainability:

Docker Support

The application includes comprehensive Docker support with:

  • Multi-stage builds for optimization
  • Environment variable management
  • Docker Compose configuration for easy deployment

Configuration Management

The system uses:

  • Environment variables for sensitive configuration
  • TypeScript for type safety
  • Proper error handling throughout the pipeline

Key Features

  1. Modular Architecture: Each component is self-contained and follows single-responsibility principles.
  2. Type Safety: Comprehensive TypeScript implementation with Zod schemas for runtime validation.
  3. Error Handling: Robust error handling at each step of the pipeline.
  4. Scalability: Docker support enables easy deployment and scaling.
  5. API Integration: Supports multiple data sources with extensible architecture.
  6. AI Analysis: Leverages advanced AI models for content analysis.

Development Tools

The project uses modern development tools:

  • TypeScript for type safety
  • Nodemon for development hot-reloading
  • Docker for containerization
  • Environment variable management
  • Proper logging throughout the system

This architecture allows for easy maintenance, testing, and extension of functionality while maintaining robust error handling and type safety throughout the application pipeline.

In-depth Project Breakdown

In this section, we will analyze each component of the project in detail, breaking down the implementation steps and technical considerations for each major feature.

1. Specifying the resources to scrape

In src/services/getCronSources.ts, we start by importing dotenv:

import dotenv from "dotenv";

dotenv.config();

This allows the application to securely load configuration values from a .env file, which is a common practice for managing sensitive information like API keys and credentials.

Then, we define a new function called getCronSources:

export async function getCronSources() {...}

The function is async because it needs to make network requests to fetch content from external sources. This allows the application to handle multiple requests efficiently without blocking.

In the function body, we start a parent try-catch block:

export async function getCronSources() {
  try {
    console.log("Fetching sources...");

    // Check for required API keys
    const hasXApiKey = !!process.env.X_API_BEARER_TOKEN;
    const hasFirecrawlKey = !!process.env.FIRECRAWL_API_KEY;

    ... // continued below

The code above performs important validation by checking for required API keys. It uses the double exclamation mark (!!) operator to convert the environment variables into boolean values, making it easy to verify if both the X API bearer token and Firecrawl API key are present. This validation step is crucial before attempting to make any API calls to ensure the application has proper authentication credentials.

    // ... continuation of the above block
    // Filter sources based on available API keys
    const sources = [
      // High priority sources (Only 1 x account due to free plan rate limits)
      ...(hasFirecrawlKey ? [
        { identifier: 'https://www.firecrawl.dev/blog' },
        { identifier: 'https://openai.com/news/' },
        { identifier: 'https://www.anthropic.com/news' },
        { identifier: 'https://news.ycombinator.com/' },
        { identifier: 'https://www.reuters.com/technology/artificial-intelligence/' },
        { identifier: 'https://simonwillison.net/' },
        { identifier: 'https://buttondown.com/ainews/archive/' },
      ] : []),
      ...(hasXApiKey ? [
        { identifier: 'https://x.com/skirano' },
      ] : []),
    ];

    return sources.map(source => source.identifier);
  } catch (error) {
    console.error(error);
  }
}

The code uses the ternary operator (? :) to conditionally include sources based on available API keys. For each API key check, if the condition before the ? is true (e.g. hasFirecrawlKey is true), the array of sources after the ? is included. Otherwise, if the condition is false, an empty array after the : is used instead.

This conditional logic ensures we only try to fetch from sources where we have valid API credentials. The spread operator (...) is used to flatten these conditional arrays into a single sources array.

For error handling, the entire function is wrapped in a try-catch block. If any error occurs during execution, it will be caught and logged to the console via console.error(). This prevents the application from crashing if there are issues with environment variables or API calls.

2. Scraping specified resources with X and Firecrawl

Inside src/services/scrapeSources.ts, we write the functionality to scrape the resources specified in getCronSources.ts with Firecrawl and X API.

The script starts with the following imports and setup:

import FirecrawlApp from "@mendable/firecrawl-js";
import dotenv from "dotenv";
import { z } from "zod";

dotenv.config();

These imports provide essential functionality for the scraping service:

  • FirecrawlApp: A JavaScript client for interacting with the Firecrawl API to scrape web content
  • dotenv: For loading environment variables from a .env file
  • zod: A TypeScript-first schema validation library

The dotenv.config() call loads environment variables at runtime, making them accessible via process.env. This is important since we’ll need API keys and other configuration stored in environment variables.

// Initialize Firecrawl
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

// 1. Define the schema for our expected JSON
const StorySchema = z.object({
  headline: z.string().describe("Story or post headline"),
  link: z.string().describe("A link to the post or story"),
  date_posted: z.string().describe("The date the story or post was published"),
});

const StoriesSchema = z.object({
  stories: z
    .array(StorySchema)
    .describe("A list of today's AI or LLM-related stories"),
});

The code above initializes Firecrawl with an API key and defines two Zod schemas for validating the data structure we expect to receive from our scraping operations.

The StorySchema defines the shape of individual story objects, with three required string fields:

  • headline: The title or headline of the story/post
  • link: URL linking to the full content
  • date_posted: Publication timestamp

The StoriesSchema wraps this in an array, expecting multiple story objects within a “stories” property. This schema will be used by Firecrawl’s scraping engine to format its output according to our needs.

The .describe() method calls on each field are essential - they provide semantic descriptions that Firecrawl’s AI engine uses to intelligently identify and extract the correct data from web pages. By understanding these descriptions, the AI can automatically determine the appropriate HTML elements and CSS selectors to target when scraping content.

export async function scrapeSources(sources: string[]) {
    // ... continued below

Then, we start a function scrapeSources that takes an array of source URLs as input and will handle the scraping of content from each provided source.

const num_sources = sources.length;
console.log(`Scraping ${num_sources} sources...`);

let combinedText: { stories: any[] } = { stories: [] };

// Configure these if you want to toggle behavior
const useTwitter = true;
const useScrape = true;

// continued below ...

The code above sets up a few key variables in the body of the function:

  • num_sources tracks how many URLs we’re processing
  • combinedText initializes an empty array to store all scraped stories
  • Two boolean flags control which scraping methods to use:
    • useTwitter enables Twitter API integration
    • useScrape enables direct web scraping

These variables will be used throughout the rest of the scraping process to control behavior and aggregate results.

  // ... continuation of above
  for (const source of sources) {
    // --- 1) Handle x.com (Twitter) sources ---
    if (source.includes("x.com")) {
      if (useTwitter) {
        const usernameMatch = source.match(/x\.com\/([^\/]+)/);
        if (usernameMatch) {
          const username = usernameMatch[1];

          // Build the search query for tweets
          const query = `from:${username} has:media -is:retweet -is:reply`;
          const encodedQuery = encodeURIComponent(query);

          // Get tweets from the last 24 hours
          const startTime = new Date(
            Date.now() - 24 * 60 * 60 * 1000
          ).toISOString();
          const encodedStartTime = encodeURIComponent(startTime);

          // x.com API URL
          const apiUrl = `https://api.x.com/2/tweets/search/recent?query=${encodedQuery}&max_results=10&start_time=${encodedStartTime}`;

          // Fetch recent tweets from the Twitter API
          const response = await fetch(apiUrl, {
            headers: {
              Authorization: `Bearer ${process.env.X_API_BEARER_TOKEN}`,
            },
          });

          // Continued below...

Next, we makes a request to the Twitter API to fetch recent tweets from a specific user. Let’s break down what’s happening:

  1. We check if the source URL contains “x.com” and if Twitter integration is enabled
  2. We extract the username from the URL using regex
  3. We construct a search query that:
    • Gets tweets from that user
    • Only includes tweets with media
    • Excludes retweets and replies
  4. We calculate a timestamp from 24 hours ago to limit results
  5. We build the API URL with the encoded query parameters
  6. Finally, we make the authenticated request using the bearer token
// ... continuation of above
if (!response.ok) {
  throw new Error(
    `Failed to fetch tweets for ${username}: ${response.statusText}`,
  );
}

After making the API request, we check if the response was successful. If not, we throw an error with details about what went wrong, including the username and the status text from the response.

          const tweets = await response.json();

          if (tweets.meta?.result_count === 0) {
            console.log(`No tweets found for username ${username}.`);
          } else if (Array.isArray(tweets.data)) {
            console.log(`Tweets found from username ${username}`);
            const stories = tweets.data.map((tweet: any) => {
              return {
                headline: tweet.text,
                link: `https://x.com/i/status/${tweet.id}`,
                date_posted: startTime,
              };
            });
            combinedText.stories.push(...stories);
          } else {
            console.error(
              "Expected tweets.data to be an array:",
              tweets.data
            );
          }
        }
      }
    }

    // Continued below...

After parsing the tweets, we map them into story objects that contain:

  • The tweet text as the headline
  • A link to the original tweet
  • The timestamp when it was posted

These story objects are then added to our combinedText array which aggregates content from multiple sources.

If no tweets are found, we log a message. If there’s an unexpected response format where tweets.data isn’t an array, we log an error with the actual data received.

The code handles all edge cases gracefully while maintaining a clean data structure for downstream processing.

    // ... continuation of above
    // --- 2) Handle all other sources with Firecrawl extract ---
    else {
      if (useScrape) {
        // Firecrawl will both scrape and extract for you
        // Provide a prompt that instructs Firecrawl what to extract
        const currentDate = new Date().toLocaleDateString();
        const promptForFirecrawl = `
        Return only today's AI or LLM related story or post headlines and links in JSON format from the page content.
        They must be posted today, ${currentDate}. The format should be:
        {
          "stories": [
            {
              "headline": "headline1",
              "link": "link1",
              "date_posted": "YYYY-MM-DD"
            },
            ...
          ]
        }
        If there are no AI or LLM stories from today, return {"stories": []}.

        The source link is ${source}.
        If a story link is not absolute, prepend ${source} to make it absolute.
        Return only pure JSON in the specified format (no extra text, no markdown, no \`\`\`).
        `;
        // continued below ...

The prompt instructs Firecrawl to extract AI/LLM related stories from the current day only. It specifies the exact JSON format required for the response, with each story containing a headline, link and date posted. The prompt ensures links are absolute by having Firecrawl prepend the source URL if needed. For clean parsing, it explicitly requests pure JSON output without any formatting or extra text.

        // Use app.extract(...) directly
        const scrapeResult = await app.extract(
          [source],
          {
            prompt: promptForFirecrawl,
            schema: StoriesSchema, // The Zod schema for expected JSON
          }
        );

        if (!scrapeResult.success) {
          throw new Error(`Failed to scrape: ${scrapeResult.error}`);
        }

        // The structured data
        const todayStories = scrapeResult.data;
        console.log(`Found ${todayStories.stories.length} stories from ${source}`);
        combinedText.stories.push(...todayStories.stories);
      }
    }
  }
  // Continued below ...

The code above implements the core scraping functionality:

  1. It constructs a prompt for Firecrawl that specifies exactly what content to extract
  2. The prompt requests AI/LLM headlines from the current day only
  3. It defines the exact JSON structure expected in the response
  4. It handles relative URLs by having Firecrawl convert them to absolute
  5. The extracted data is validated against a Zod schema
  6. Valid results are accumulated into the combinedText array
  7. Error handling ensures failed scrapes don’t crash the process
  // ... continuation of above
  // Return the combined stories from all sources
  const rawStories = combinedText.stories;
  console.log(rawStories);
  return rawStories;
}
// End of script

Finally, this code returns the raw stories array containing all the scraped headlines and content from the various sources. The stories can then be processed further for trend analysis and summarization.

3. Synthesizing scraped contents into a summary

Inside src/services/generateDraft.ts, we write the functionality to convert the raw stories scraped in the previous script into a summary message that will later be sent with Slack.

The script starts with the following imports:

import dotenv from "dotenv";
import Together from "together-ai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

dotenv.config();

The script imports several key dependencies:

  • dotenv: For loading environment variables from a .env file
  • Together: The Together.ai client library for making API calls
  • z from zod: A TypeScript-first schema validation library
  • zodToJsonSchema: A utility to convert Zod schemas to JSON Schema format
/**
 * Generate a post draft with trending ideas based on raw tweets.
 */
export async function generateDraft(rawStories: string) {
  console.log(`Generating a post draft with raw stories (${rawStories.length} characters)...`)

  // continued below ...

The generateDraft function takes the raw stories as input and processes them to identify key trends and generate a summary. First, it prints a log message indicating the size of the input by showing the character count.

  // ... continuation of above
  try {
    // Initialize Together client
    const together = new Together();

    // Define the schema for our response
    const DraftPostSchema = z.object({
      interestingTweetsOrStories: z.array(z.object({
        story_or_tweet_link: z.string().describe("The direct link to the tweet or story"),
        description: z.string().describe("A short sentence describing what's interesting about the tweet or story")
      }))
    }).describe("Draft post schema with interesting tweets or stories for AI developers.");

    // Convert our Zod schema to JSON Schema
    const jsonSchema = zodToJsonSchema(DraftPostSchema, {
      name: 'DraftPostSchema',
      nameStrategy: 'title'
    });

    // Create a date string if you need it in the post header
    const currentDate = new Date().toLocaleDateString('en-US', {
      timeZone: 'America/New_York',
      month: 'numeric',
      day: 'numeric',
    });

    // continued below ...

In this block, we set up the core functionality for generating the draft post. We initialize the Together AI client which will be used for making API calls. We then define a Zod schema that specifies the expected structure of our response - an array of interesting tweets/stories where each item has a link and description. This schema is converted to JSON Schema format which will help enforce the output structure. Finally, we create a formatted date string in US format (MM/DD) that can be used in the post header.

// ...continuation of above
// Use Together’s chat completion with the Llama 3.1 model
const completion = await together.chat.completions.create({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
  messages: [
    {
      role: "system",
      content: `You are given a list of raw AI and LLM-related tweets sourced from X/Twitter.
Only respond in valid JSON that matches the provided schema (no extra keys).
`,
    },
    {
      role: "user",
      content: `Your task is to find interesting trends, launches, or interesting examples from the tweets or stories. 
For each tweet or story, provide a 'story_or_tweet_link' and a one-sentence 'description'. 
Return all relevant tweets or stories as separate objects. 
Aim to pick at least 10 tweets or stories unless there are fewer than 10 available. If there are less than 10 tweets or stories, return ALL of them. Here are the raw tweets or stories you can pick from:\n\n${rawStories}\n\n`,
    },
  ],
  // Tell Together to strictly enforce JSON output that matches our schema
  // @ts-ignore
  response_format: { type: "json_object", schema: jsonSchema },
});

// continued below ...

In this block, we make the API call to Together AI using their chat completions endpoint with the Llama 3.1 model. The system prompt instructs the model to only output valid JSON matching our schema. The user prompt provides the actual task - finding interesting trends, launches and examples from the raw tweets/stories. We request at least 10 items (or all available if less than 10) and pass in the raw content. The response_format parameter enforces strict JSON output matching our defined schema.

The completion response will contain structured JSON data that we can parse and use to generate our draft post. Each item will have a link to the original tweet/story and a concise description of what makes it noteworthy.

    // Check if we got a content payload in the first choice
    const rawJSON = completion?.choices?.[0]?.message?.content;
    if (!rawJSON) {
      console.log("No JSON output returned from Together.");
      return "No output.";
    }
    console.log(rawJSON);

    // Parse the JSON to match our schema
    const parsedResponse = JSON.parse(rawJSON);

    // Construct the final post
    const header = `🚀 AI and LLM Trends on X for ${currentDate}\n\n`;
    const draft_post = header + parsedResponse.interestingTweetsOrStories
      .map((tweetOrStory: any) => `• ${tweetOrStory.description}\n  ${tweetOrStory.story_or_tweet_link}`)
      .join('\n\n');

    return draft_post;

  } catch (error) {
    console.error("Error generating draft post", error);
    return "Error generating draft post.";
  }
}
// End of script

This code block shows the final part of our script where we handle the Together AI API response. We first check if we received valid JSON content in the response. If not, we log an error and return early.

If we have valid JSON, we parse it into a structured object matching our schema. Then we construct the final post by adding a header with the current date and mapping over the interesting tweets/stories to create bullet points. Each bullet point contains the description and link.

The script includes error handling to catch and log any issues that occur during execution. If there’s an error, it returns a generic error message rather than failing silently.

This completes the core functionality of our trend finding script. The next sections will cover setting up notifications, scheduling, and deployment.

4. Setting up a notification system with Slack

Inside src/services/sendDraft.ts, we write the functionality to send the composed final post as a Slack message through a webhook:

import axios from "axios";
import dotenv from "dotenv";
dotenv.config();

export async function sendDraft(draft_post: string) {
  try {
    const response = await axios.post(
      process.env.SLACK_WEBHOOK_URL || "",
      {
        text: draft_post,
      },
      {
        headers: {
          "Content-Type": "application/json",
        },
      },
    );

    return `Success sending draft to webhook at ${new Date().toISOString()}`;
  } catch (error) {
    console.log("error sending draft to webhook");
    console.log(error);
  }
}

This script sets up a Slack notification system by creating a sendDraft function that takes a draft post as input and sends it to a configured Slack webhook URL. The function uses axios to make a POST request to the webhook with the draft text. It includes error handling to log any issues that occur during the sending process. The webhook URL is loaded from environment variables using dotenv for security. On success, it returns a timestamp of when the draft was sent.

5. Writing a script to execute the system with Cron

The cron.ts file contains the main execution logic for our trend finding system. It exports a handleCron function that orchestrates the entire workflow:

// src/controllers/cron.ts
import { scrapeSources } from "../services/scrapeSources";
import { getCronSources } from "../services/getCronSources";
import { generateDraft } from "../services/generateDraft";
import { sendDraft } from "../services/sendDraft";
export const handleCron = async (): Promise<void> => {
  try {
    const cronSources = await getCronSources();
    const rawStories = await scrapeSources(cronSources!);
    const rawStoriesString = JSON.stringify(rawStories);
    const draftPost = await generateDraft(rawStoriesString);
    const result = await sendDraft(draftPost!);
    console.log(result);
  } catch (error) {
    console.error(error);
  }
};

First, it retrieves the list of sources to scrape by calling getCronSources(). Then it scrapes those sources using scrapeSources() to get the raw story data. This raw data is stringified into JSON format.

Next, it generates a draft post from the story data by passing it to generateDraft(). Finally, it sends the draft to Slack using sendDraft() and logs the result.

The function includes error handling to catch and log any issues that occur during execution. This script ties together all the individual services we created to form a complete automated workflow.

6. Creating a project entrypoint

The src/index.ts file serves as the main entry point for our application. It imports the handleCron function from our cron controller and sets up the execution flow.

The file uses node-cron for scheduling and dotenv for environment variable management. The main function provides a simple way to run the draft generation process manually.

There’s also a commented-out cron schedule that can be uncommented to run the job automatically at 5 PM daily (0 17 ** *).

import { handleCron } from "./controllers/cron";
import cron from "node-cron";
import dotenv from "dotenv";

dotenv.config();

async function main() {
  console.log(`Starting process to generate draft...`);
  await handleCron();
}
main();

// If you want to run the cron job manually, uncomment the following line:
//cron.schedule(`0 17 * * *`, async () => {
//  console.log(`Starting process to generate draft...`);
//  await handleCron();
//});

When you npm run start, this script is executed.


At this point, the project is ready for local use. You can modify the topic configurations at any time to track different subjects and generate Slack summaries on demand. While running locally is useful for testing, we’ll explore an even more powerful automation option using GitHub Actions in the next section.

7. Deploying the project with GitHub Actions

Now that we have our project working locally, let’s take it to the next level by automating it with GitHub Actions. GitHub Actions is a powerful CI/CD platform that allows us to automate workflows directly from our GitHub repository. Instead of running our trend finder manually or setting up a server to host it, we can leverage GitHub’s infrastructure to run our script on a schedule, completely free for public repositories. Let’s set it up.

First, create a new file in your repository at .github/workflows/trend-finder.yml:

mkdir -p .github/workflows
touch .github/workflows/trend-finder.yml

Then, paste the following contents:

name: Run Trend Finder

on:
  schedule:
    - cron: "0 17 * * *" # Runs at 5 PM UTC daily
  workflow_dispatch: # Allows manual trigger from GitHub UI

jobs:
  find-trends:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: "18"

      - name: Install dependencies
        run: npm install

      - name: Run trend finder
        env:
          X_API_BEARER_TOKEN: ${{ secrets.X_API_BEARER_TOKEN }}
          FIRECRAWL_API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
          TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
        run: npm run start

This workflow configuration does several important things:

  1. Scheduling: The on.schedule section sets up automatic daily runs at 5 PM UTC
  2. Manual Triggers: workflow_dispatch allows you to run the workflow manually from GitHub’s UI
  3. Environment: Uses Ubuntu as the runner environment
  4. Setup: Configures Node.js and installs dependencies
  5. Secrets: Securely passes API keys and tokens from GitHub Secrets to the application

To set up the secrets in your GitHub repository:

  1. Go to your repository’s Settings
  2. Click on “Secrets and variables” → “Actions”
  3. Add each required secret:
    • X_API_BEARER_TOKEN
    • FIRECRAWL_API_KEY
    • TOGETHER_API_KEY
    • SLACK_WEBHOOK_URL

The workflow will now run automatically every day at 5 PM UTC, scraping your configured sources and sending trend updates to your Slack channel. You can also trigger it manually:

  1. Go to your repository’s “Actions” tab
  2. Select “Run Trend Finder” workflow
  3. Click “Run workflow”

Some key benefits of using GitHub Actions:

  • Zero Infrastructure: No need to maintain servers or worry about uptime
  • Cost Effective: Free for public repositories (2000 minutes/month)
  • Version Controlled: Your automation configuration lives with your code
  • Easy Monitoring: Built-in logs and status checks
  • Flexible Scheduling: Easy to modify run times or add multiple schedules

Next Steps

Now that your trend finder is fully automated, here are some ways to extend it:

  1. Custom Topics: Modify the scraping configurations to track different topics
  2. Additional Sources: Add more websites or social media accounts to monitor
  3. Enhanced Analysis: Customize the AI prompts for different types of trend analysis
  4. Multiple Channels: Set up different Slack channels for different topic categories
  5. Metrics: Add monitoring for successful runs and trend detection rates

The complete project provides a robust foundation for automated trend detection that you can build upon based on your specific needs.

Troubleshooting

If you encounter issues with the GitHub Actions workflow:

  1. Check Logs: Review the workflow run logs in the Actions tab
  2. Verify Secrets: Ensure all secrets are properly set and not expired
  3. Rate Limits: Monitor API rate limits, especially for the X API
  4. Timeout Issues: Consider breaking up large scraping jobs if runs timeout
  5. Dependencies: Keep Node.js dependencies updated to latest stable versions

For additional help, check the project’s GitHub Issues or create a new one with specific details about any problems you encounter.

Limitations of Free Tier Tools Used

While this project uses several free tier services to minimize costs, there are some limitations to be aware of:

  1. X API Rate Limits

    • Limited to 1 account scrape requests/15-minute window
    • Some advanced filtering features not available
  2. GitHub Actions Minutes

    • 2,000 minutes/month for public repositories
    • 3,000 minutes/month for private repositories
    • Additional minutes require paid plan
  3. Together AI Free Credits

    • $1 in free credits for new accounts
    • 600 requests per minute
  4. Firecrawl API Limits

    • 500 requests/month on free plan

To work within these constraints:

  • Carefully plan scraping intervals
  • Implement caching where possible
  • Monitor usage to avoid hitting limits
  • Consider paid tiers for production use

For most personal or small team use cases, the free tiers provide sufficient capacity. However, larger scale deployments may require upgrading to paid plans for higher limits and additional features.

Conclusion

You’ve now built and deployed a fully automated trend detection system that leverages AI, web scraping, and cloud automation. This solution provides real-time insights into emerging trends across your chosen sources, delivered directly to Slack. With the foundation in place, you can easily customize and expand the system to match your specific trend monitoring needs. The combination of GitHub Actions for automation, Firecrawl for AI web scraping, Together AI for analysis, and Slack for notifications creates a powerful, maintainable solution that will help you stay ahead of relevant trends in your field.

Ready to Build?

Start scraping web data for your AI apps today.
No credit card needed.

About the Author

Bex Tuychiev image
Bex Tuychiev@bextuychiev

Bex is a Top 10 AI writer on Medium and a Kaggle Master with over 15k followers. He loves writing detailed guides, tutorials, and notebooks on complex data science and machine learning topics

More articles by Bex Tuychiev

Building an Automated Price Tracking Tool

Build an automated e-commerce price tracker in Python. Learn web scraping, price monitoring, and automated alerts using Firecrawl, Streamlit, PostgreSQL.

Web Scraping Automation: How to Run Scrapers on a Schedule

Learn how to automate web scraping in Python using free tools like schedule, asyncio, cron jobs and GitHub Actions. This comprehensive guide covers local and cloud-based scheduling methods to run scrapers reliably in 2025.

BeautifulSoup4 vs. Scrapy - A Comprehensive Comparison for Web Scraping in Python

Learn the key differences between BeautifulSoup4 and Scrapy for web scraping in Python. Compare their features, performance, and use cases to choose the right tool for your web scraping needs.

How to Build an Automated Competitor Price Monitoring System with Python

Learn how to build an automated competitor price monitoring system in Python that tracks prices across e-commerce sites, provides real-time comparisons, and maintains price history using Firecrawl, Streamlit, and GitHub Actions.

Data Enrichment: A Complete Guide to Enhancing Your Data Quality

Learn how to enrich your data quality with a comprehensive guide covering data enrichment tools, best practices, and real-world examples. Discover how to leverage modern solutions like Firecrawl to automate data collection, validation, and integration for better business insights.

How to Deploy Python Web Scrapers

Learn how to deploy Python web scrapers using GitHub Actions, Heroku, PythonAnywhere and more.

How to Generate Sitemaps Using Firecrawl's /map Endpoint: A Complete Guide

Learn how to generate XML and visual sitemaps using Firecrawl's /map endpoint. Step-by-step guide with Python code examples, performance comparisons, and interactive visualization techniques for effective website mapping.

How to Use Firecrawl's Scrape API: Complete Web Scraping Tutorial

Learn how to scrape websites using Firecrawl's /scrape endpoint. Master JavaScript rendering, structured data extraction, and batch operations with Python code examples.