SaaS Pricing Feature Monitor collector facts

Publisher: bo-05 (@bo-05).

Version: 2. Last updated: 2026-07-03T05:06:50.399Z.

Run this collector on demand, as an API endpoint, or on a schedule with Firecrawl Prometheus.

Sample fields: query, region, language, seed_urls, sort_hint, output_mode, collected_at, company_rows, notes, plans, tagline, category.

Parameters: query (string, required), seed-urls (string, required), max-companies (number, required), output-mode (string, required), region (string), language (string), include-feature-matrix (boolean), include-addons (boolean), include-faq-signals (boolean), billing-preference (string), snapshot-label (string), sort-hint (string).

SaaS Pricing Feature Monitor

v2Published

Tracks official SaaS pricing pages and extracts company-level evidence plus nested plan pricing, limits, features, add-ons, FAQ signals, and snapshot metadata.

Output & API

Preview the latest data, download it, or call this collector as an API.

Author's sample data
queryemail marketing
regionglobal
languageEnglish
seed_urls
sort_hintbest match
output_modecompany_rows
collected_at2026-07-03T04:45:42.113Z
company_rows
max_companies2
snapshot_label2026-07-03T04:45:42.110Z
billing_preferenceboth
Parameters
--querystringrequiredProduct category, competitive theme, or use case to discover relevant SaaS pricing pages. e.g. "email marketing"
--seed-urlsstringrequiredComma-separated pricing page URLs, competitor homepages, or vendor domains. Use an empty string only when query is meaningful enough for discovery. e.g. "https://mailchimp.com/pricing/marketing/,https://www.constantcontact.com/pricing"
--max-companiesnumberrequiredMaximum number of unique SaaS companies to return. e.g. 2
--output-modestringrequiredUse `company_rows` for a company list with nested plans or `grouped_by_company` for an object keyed by company domain. e.g. "company_rows"
--regionstringMarket focus for discovery and extraction context. default "global"
--languagestringPreferred language for discovery and extraction context. default "English"
--include-feature-matrixbooleanWhether to extract visible plan-to-feature matrix rows when present. default true
--include-addonsbooleanWhether to extract add-ons, overage pricing, implementation, support, and onboarding pricing text when visible. default true
--include-faq-signalsbooleanWhether to extract pricing-related FAQ signals when visible. default true
--billing-preferencestringBilling cadence focus: `monthly`, `annual`, or `both`. default "both"
--snapshot-labelstringCaller-supplied snapshot label. Leave blank to use the current run timestamp. default ""
--sort-hintstringDiscovery hint such as `best match`, `most popular`, or `enterprise`. default "best match"

Marketplace

Publish this collector so others can deploy it — you keep ownership.

0 subscribers
bo-05@bo-05
0 runs in 14d · published 2d ago

Versions

Every build and self-heal appends a version. Pin one to lock runs to it.

managed by author
v2importedapprovedcurrent2d ago
v1builtrejected2d ago
How this script collects data
import Firecrawl from "@mendable/firecrawl-js";
import { parseArgs } from "node:util";

const apiKey = process.env.FIRECRAWL_API_KEY;
if (!apiKey) {
  console.error("FIRECRAWL_API_KEY is not set");
  process.exit(1);
}

const firecrawl = new Firecrawl({ apiKey });

const { values: flags } = parseArgs({
  strict: true,
  options: {
    query: { type: "string" },
    "seed-urls": { type: "string" },
    "max-companies": { type: "string" },
    "output-mode": { type: "string" },
    region: { type: "string" },
    language: { type: "string" },
    "include-feature-matrix": { type: "string" },
    "include-addons": { type: "string" },
    "include-faq-signals": { type: "string" },
    "billing-preference": { type: "string" },
    "snapshot-label": { type: "string" },
    "sort-hint": { type: "string" },
  },
});

const query = clean(flags.query);
const seedUrlsInput = clean(flags["seed-urls"]);
const maxCompaniesRaw = clean(flags["max-companies"]);
const outputMode = clean(flags["output-mode"]);
const region = clean(flags.region) || "global";
const language = clean(flags.language) || "English";
const includeFeatureMatrix = parseBooleanFlag(flags["include-feature-matrix"], true, "include-feature-matrix");
const includeAddons = parseBooleanFlag(flags["include-addons"], true, "include-addons");
const includeFaqSignals = parseBooleanFlag(flags["include-faq-signals"], true, "include-faq-signals");
const billingPreference = clean(flags["billing-preference"]) || "both";
const snapshotLabel = clean(flags["snapshot-label"]) || new Date().toISOString();
const sortHint = clean(flags["sort-hint"]) || "best match";

if (!query && !seedUrlsInput) {
  console.error("OUT_OF_SCOPE: at least one of --query or --seed-urls must be provided");
  process.exit(1);
}
if (!maxCompaniesRaw) {
  console.error("--max-companies is required");
  process.exit(1);
}
const maxCompanies = Number(maxCompaniesRaw);
if (!Number.isInteger(maxCompanies) || maxCompanies < 1) {
  console.error("OUT_OF_SCOPE: --max-companies must be a positive integer");
  process.exit(1);
}
if (outputMode !== "company_rows" && outputMode !== "grouped_by_company") {
  console.error("OUT_OF_SCOPE: --output-mode must be either company_rows or grouped_by_company");
  process.exit(1);
}
if (billingPreference !== "monthly" && billingPreference !== "annual" && billingPreference !== "both") {
  console.error("OUT_OF_SCOPE: --billing-preference must be monthly, annual, or both");
  process.exit(1);
}

const planSchema = {
  type: "object",
  additionalProperties: false,
  properties: {
    plan_name: { type: ["string", "null"] },
    plan_order: { type: ["number", "null"] },
    pricing_text_raw: { type: ["string", "null"] },
    price_amount: { type: ["number", "null"] },
    currency: { type: ["string", "null"] },
    billing_period: { type: ["string", "null"] },
    annual_discount_text: { type: ["string", "null"] },
    free_plan_available: { type: ["boolean", "null"] },
    free_trial_available: { type: ["boolean", "null"] },
    contact_sales_only: { type: ["boolean", "null"] },
    seat_limit_text: { type: ["string", "null"] },
    usage_limit_text: { type: ["string", "null"] },
    key_features: { type: "array", items: { type: "string" } },
    feature_matrix_rows: {
      type: "array",
      items: {
        type: "object",
        additionalProperties: false,
        properties: {
          feature: { type: ["string", "null"] },
          availability: { type: ["string", "null"] },
          evidence_text: { type: ["string", "null"] },
        },
        required: ["feature", "availability", "evidence_text"],
      },
    },
    addons_text: { type: ["string", "null"] },
    support_or_onboarding_text: { type: ["string", "null"] },
    faq_pricing_signals: { type: "array", items: { type: "string" } },
    confidence: { type: ["number", "null"] },
    notes: { type: ["string", "null"] },
  },
  required: [
    "plan_name",
    "plan_order",
    "pricing_text_raw",
    "price_amount",
    "currency",
    "billing_period",
    "annual_discount_text",
    "free_plan_available",
    "free_trial_available",
    "contact_sales_only",
    "seat_limit_text",
    "usage_limit_text",
    "key_features",
    "feature_matrix_rows",
    "addons_text",
    "support_or_onboarding_text",
    "faq_pricing_signals",
    "confidence",
    "notes",
  ],
};

const pageSchema = {
  type: "object",
  additionalProperties: false,
  properties: {
    company_name: { type: ["string", "null"] },
    product_name: { type: ["string", "null"] },
    category: { type: ["string", "null"] },
    tagline: { type: ["string", "null"] },
    pricing_page_url: { type: ["string", "null"] },
    secondary_source_urls: { type: "array", items: { type: "string" } },
    free_plan_available: { type: ["boolean", "null"] },
    free_trial_available: { type: ["boolean", "null"] },
    contact_sales_only: { type: ["boolean", "null"] },
    annual_discount_text: { type: ["string", "null"] },
    addons_text: { type: ["string", "null"] },
    support_or_onboarding_text: { type: ["string", "null"] },
    faq_pricing_signals: { type: "array", items: { type: "string" } },
    plans: { type: "array", items: planSchema },
    confidence: { type: ["number", "null"] },
    notes: { type: ["string", "null"] },
  },
  required: [
    "company_name",
    "product_name",
    "category",
    "tagline",
    "pricing_page_url",
    "secondary_source_urls",
    "free_plan_available",
    "free_trial_available",
    "contact_sales_only",
    "annual_discount_text",
    "addons_text",
    "support_or_onboarding_text",
    "faq_pricing_signals",
    "plans",
    "confidence",
    "notes",
  ],
};

type Candidate = {
  url: string;
  source: "seed" | "seed_map" | "search";
  sourcePageUrl: string;
};

type Plan = {
  tracking_key: string;
  snapshot_label: string;
  collected_at: string;
  company_name: string | null;
  company_domain: string;
  product_name: string | null;
  pricing_page_url: string;
  source_page_url: string;
  secondary_source_urls: string[];
  category: string | null;
  tagline: string | null;
  plan_name: string | null;
  plan_order: number | null;
  pricing_text_raw: string | null;
  price_amount: number | null;
  currency: string | null;
  billing_period: string | null;
  annual_discount_text: string | null;
  free_plan_available: boolean | null;
  free_trial_available: boolean | null;
  contact_sales_only: boolean | null;
  seat_limit_text: string | null;
  usage_limit_text: string | null;
  key_features: string[];
  feature_matrix_rows: Array<{ feature: string | null; availability: string | null; evidence_text: string | null }>;
  addons_text: string | null;
  support_or_onboarding_text: string | null;
  faq_pricing_signals: string[];
  confidence: number | null;
  notes: string | null;
};

type Company = {
  tracking_key: string;
  snapshot_label: string;
  collected_at: string;
  company_name: string | null;
  company_domain: string;
  product_name: string | null;
  pricing_page_url: string;
  source_page_url: string;
  secondary_source_urls: string[];
  category: string | null;
  tagline: string | null;
  free_plan_available: boolean | null;
  free_trial_available: boolean | null;
  contact_sales_only: boolean | null;
  annual_discount_text: string | null;
  addons_text: string | null;
  support_or_onboarding_text: string | null;
  faq_pricing_signals: string[];
  confidence: number | null;
  notes: string | null;
  plans: Plan[];
};

function clean(value: unknown): string {
  return typeof value === "string" ? value.trim() : "";
}

function parseBooleanFlag(value: unknown, defaultValue: boolean, name: string): boolean {
  const cleaned = clean(value);
  if (!cleaned) return defaultValue;
  if (cleaned === "true") return true;
  if (cleaned === "false") return false;
  console.error(`OUT_OF_SCOPE: --${name} must be true or false`);
  process.exit(1);
}

function normalizeUrl(input: string): string | null {
  const value = input.trim();
  if (!value) return null;
  const withProtocol = /^https?:\/\//i.test(value) ? value : `https://${value}`;
  try {
    const url = new URL(withProtocol);
    if (!url.hostname.includes(".")) return null;
    url.hash = "";
    return url.toString();
  } catch {
    return null;
  }
}

function hostnameFromUrl(url: string): string {
  try {
    return new URL(url).hostname.toLowerCase().replace(/^www\./, "");
  } catch {
    return "";
  }
}

function baseDomainFromUrl(url: string): string {
  const host = hostnameFromUrl(url);
  const labels = host.split(".").filter(Boolean);
  if (labels.length <= 2) return host;
  const lastTwo = labels.slice(-2).join(".");
  const lastThree = labels.slice(-3).join(".");
  const secondLevel = labels[labels.length - 2];
  if (["co", "com", "org", "net", "ac", "gov"].includes(secondLevel) && labels.length >= 3) {
    return lastThree;
  }
  return lastTwo;
}

function looksLikePricingUrl(url: string): boolean {
  try {
    const parsed = new URL(url);
    const segments = parsed.pathname
      .toLowerCase()
      .split("/")
      .map((segment) => segment.trim())
      .filter(Boolean);
    return segments.some((segment) => {
      if (["pricing", "plans", "billing", "price", "prices"].includes(segment)) return true;
      return (
        segment.startsWith("pricing-") ||
        segment.endsWith("-pricing") ||
        segment === "plans-and-pricing" ||
        segment === "pricing-and-plans"
      );
    });
  } catch {
    const lower = url.toLowerCase();
    return lower.includes("/pricing") || lower.includes("/plans") || lower.includes("/billing");
  }
}

function looksUnofficial(url: string): boolean {
  const lower = url.toLowerCase();
  const host = hostnameFromUrl(url);
  const blockedHosts = [
    "g2.com",
    "capterra.com",
    "getapp.com",
    "softwareadvice.com",
    "trustradius.com",
    "saasworthy.com",
    "alternativeto.net",
    "producthunt.com",
    "reddit.com",
    "quora.com",
    "wikipedia.org",
    "youtube.com",
    "linkedin.com",
    "x.com",
    "twitter.com",
    "facebook.com",
    "medium.com",
  ];
  if (blockedHosts.some((blocked) => host === blocked || host.endsWith(`.${blocked}`))) return true;
  if (host.startsWith("community.") || host.startsWith("forum.") || host.includes(".community.")) return true;
  return (
    lower.includes("/blog") ||
    lower.includes("/article") ||
    lower.includes("/resources/") ||
    lower.includes("/guide") ||
    lower.includes("/compare/") ||
    lower.includes("/comparison") ||
    lower.includes("/alternatives")
  );
}

function addCandidate(candidates: Candidate[], seenUrls: Set<string>, candidate: Candidate) {
  const normalized = normalizeUrl(candidate.url);
  if (!normalized || looksUnofficial(normalized) || seenUrls.has(normalized)) return;
  seenUrls.add(normalized);
  candidates.push({ ...candidate, url: normalized });
}

async function discoverCandidates(): Promise<Candidate[]> {
  const candidates: Candidate[] = [];
  const seenUrls = new Set<string>();
  const seeds = seedUrlsInput
    .split(",")
    .map((item) => normalizeUrl(item))
    .filter((item): item is string => Boolean(item));

  for (const seed of seeds) {
    if (looksLikePricingUrl(seed)) {
      addCandidate(candidates, seenUrls, { url: seed, source: "seed", sourcePageUrl: seed });
      continue;
    }

    try {
      const mapped = await firecrawl.map(seed, {
        search: "pricing",
        sitemap: "include",
        limit: 5,
        integration: "prometheus",
      });
      const links = Array.isArray(mapped?.links) ? mapped.links : [];
      let addedMapped = false;
      for (const link of links) {
        const url = typeof link === "string" ? link : link?.url;
        if (url && looksLikePricingUrl(url)) {
          addCandidate(candidates, seenUrls, { url, source: "seed_map", sourcePageUrl: seed });
          addedMapped = true;
        }
      }
      if (!addedMapped) {
        const root = new URL(seed).origin;
        addCandidate(candidates, seenUrls, { url: `${root}/pricing`, source: "seed_map", sourcePageUrl: seed });
      }
    } catch (err) {
      console.error(`Seed pricing discovery failed for ${seed}: ${err}`);
      try {
        const root = new URL(seed).origin;
        addCandidate(candidates, seenUrls, { url: `${root}/pricing`, source: "seed_map", sourcePageUrl: seed });
      } catch {
        // Already normalized above.
      }
    }
  }

  if (query && candidates.length < maxCompanies) {
    const searchQueries = [
      `${query} SaaS official pricing page ${region} ${sortHint} ${language}`,
      `${query} software pricing plans official ${region} ${language}`,
    ];
    const mappedSearchDomains = new Set<string>();
    for (const searchQuery of searchQueries) {
      if (candidates.length >= maxCompanies * 2) break;
      const result = await firecrawl.search(searchQuery, {
        limit: Math.max(5, maxCompanies * 4),
        integration: "prometheus",
      });
      const web = Array.isArray(result?.web) ? result.web : [];
      for (const item of web) {
        const url = item?.url;
        if (url && looksLikePricingUrl(url)) {
          addCandidate(candidates, seenUrls, { url, source: "search", sourcePageUrl: url });
          continue;
        }
        const normalized = url ? normalizeUrl(url) : null;
        const domain = normalized ? baseDomainFromUrl(normalized) : "";
        if (
          normalized &&
          domain &&
          !looksUnofficial(normalized) &&
          !mappedSearchDomains.has(domain) &&
          mappedSearchDomains.size < Math.max(3, maxCompanies * 2)
        ) {
          mappedSearchDomains.add(domain);
          try {
            const mapped = await firecrawl.map(normalized, {
              search: "pricing",
              sitemap: "include",
              limit: 5,
              integration: "prometheus",
            });
            const links = Array.isArray(mapped?.links) ? mapped.links : [];
            for (const link of links) {
              const mappedUrl = typeof link === "string" ? link : link?.url;
              if (mappedUrl && looksLikePricingUrl(mappedUrl)) {
                addCandidate(candidates, seenUrls, { url: mappedUrl, source: "search", sourcePageUrl: normalized });
              }
            }
          } catch (err) {
            console.error(`Search-result pricing discovery failed for ${normalized}: ${err}`);
          }
        }
      }
    }
  }

  const byDomain = new Map<string, Candidate>();
  for (const candidate of candidates) {
    const domain = baseDomainFromUrl(candidate.url);
    if (!domain) continue;
    if (!byDomain.has(domain)) byDomain.set(domain, candidate);
  }
  return Array.from(byDomain.values()).slice(0, Math.max(maxCompanies * 2, maxCompanies));
}

function extractionPrompt(url: string): string {
  return [
    "Extract only visible SaaS pricing-page information from this official vendor page.",
    `The requested category or theme is: ${query || "unspecified"}. Region focus: ${region}. Preferred language: ${language}.`,
    `Billing preference: ${billingPreference}. If monthly and annual prices are both visible, capture both when possible; otherwise capture what is visible.`,
    includeFeatureMatrix
      ? "Extract visible feature matrix rows conservatively as plan-to-feature mappings."
      : "Do not include feature matrix rows; return an empty feature_matrix_rows array.",
    includeAddons
      ? "Extract add-ons, overage pricing, implementation fees, support tiers, and onboarding pricing text when visible."
      : "Do not include add-ons; set addons_text to null unless it is required to explain a plan price.",
    includeFaqSignals
      ? "Extract FAQ-based pricing signals when visible."
      : "Do not include FAQ pricing signals; return an empty faq_pricing_signals array.",
    "Preserve exact visible pricing text in pricing_text_raw. If a plan says Contact sales or Custom, preserve that exact text and set price_amount to null.",
    "Use null for unavailable fields and never infer prices or features that are not visible.",
    `Evidence URL being scraped: ${url}`,
  ].join(" ");
}

function stringArray(value: unknown): string[] {
  return Array.isArray(value) ? value.map((item) => clean(item)).filter(Boolean) : [];
}

function matrixRows(value: unknown): Array<{ feature: string | null; availability: string | null; evidence_text: string | null }> {
  if (!includeFeatureMatrix || !Array.isArray(value)) return [];
  return value
    .map((item) => {
      const row = item && typeof item === "object" ? (item as Record<string, unknown>) : {};
      return {
        feature: clean(row.feature) || null,
        availability: clean(row.availability) || null,
        evidence_text: clean(row.evidence_text) || null,
      };
    })
    .filter((row) => row.feature || row.availability || row.evidence_text);
}

function booleanOrNull(value: unknown): boolean | null {
  return typeof value === "boolean" ? value : null;
}

function numberOrNull(value: unknown): number | null {
  return typeof value === "number" && Number.isFinite(value) ? value : null;
}

function nullableString(value: unknown): string | null {
  return clean(value) || null;
}

function normalizeCompany(raw: unknown, candidate: Candidate, collectedAt: string): Company | null {
  const data = raw && typeof raw === "object" ? (raw as Record<string, unknown>) : {};
  const domain = baseDomainFromUrl(candidate.url);
  if (!domain) return null;

  const companyName = nullableString(data.company_name);
  const productName = nullableString(data.product_name);
  const category = nullableString(data.category) || (query || null);
  const tagline = nullableString(data.tagline);
  const pricingPageUrl = normalizeUrl(clean(data.pricing_page_url)) || candidate.url;
  const secondarySourceUrls = Array.from(new Set([candidate.sourcePageUrl, ...stringArray(data.secondary_source_urls)].filter(Boolean)));
  const companyTrackingKey = `${domain}::${snapshotLabel}`;
  const companyFreePlan = booleanOrNull(data.free_plan_available);
  const companyFreeTrial = booleanOrNull(data.free_trial_available);
  const companyContactSales = booleanOrNull(data.contact_sales_only);
  const companyAnnualDiscount = nullableString(data.annual_discount_text);
  const companyAddons = includeAddons ? nullableString(data.addons_text) : null;
  const companySupport = nullableString(data.support_or_onboarding_text);
  const companyFaq = includeFaqSignals ? stringArray(data.faq_pricing_signals) : [];
  const companyConfidence = numberOrNull(data.confidence);
  const companyNotes = nullableString(data.notes);
  const rawPlans = Array.isArray(data.plans) ? data.plans : [];

  const plans = rawPlans.map((item, index) => {
    const plan = item && typeof item === "object" ? (item as Record<string, unknown>) : {};
    const planName = nullableString(plan.plan_name);
    const planOrder = numberOrNull(plan.plan_order) ?? index + 1;
    const billingPeriod = nullableString(plan.billing_period);
    const trackingKey = `${domain}::${planName || `plan_${planOrder}`}::${snapshotLabel}`;
    return {
      tracking_key: trackingKey,
      snapshot_label: snapshotLabel,
      collected_at: collectedAt,
      company_name: companyName,
      company_domain: domain,
      product_name: productName,
      pricing_page_url: pricingPageUrl,
      source_page_url: candidate.sourcePageUrl,
      secondary_source_urls: secondarySourceUrls,
      category,
      tagline,
      plan_name: planName,
      plan_order: planOrder,
      pricing_text_raw: nullableString(plan.pricing_text_raw),
      price_amount: numberOrNull(plan.price_amount),
      currency: nullableString(plan.currency),
      billing_period: billingPeriod,
      annual_discount_text: nullableString(plan.annual_discount_text) || companyAnnualDiscount,
      free_plan_available: booleanOrNull(plan.free_plan_available) ?? companyFreePlan,
      free_trial_available: booleanOrNull(plan.free_trial_available) ?? companyFreeTrial,
      contact_sales_only: booleanOrNull(plan.contact_sales_only) ?? companyContactSales,
      seat_limit_text: nullableString(plan.seat_limit_text),
      usage_limit_text: nullableString(plan.usage_limit_text),
      key_features: stringArray(plan.key_features),
      feature_matrix_rows: matrixRows(plan.feature_matrix_rows),
      addons_text: includeAddons ? nullableString(plan.addons_text) || companyAddons : null,
      support_or_onboarding_text: nullableString(plan.support_or_onboarding_text) || companySupport,
      faq_pricing_signals: includeFaqSignals ? stringArray(plan.faq_pricing_signals).concat(companyFaq) : [],
      confidence: numberOrNull(plan.confidence) ?? companyConfidence,
      notes: nullableString(plan.notes),
    };
  });

  if (plans.length === 0) {
    plans.push({
      tracking_key: `${domain}::pricing_page::${snapshotLabel}`,
      snapshot_label: snapshotLabel,
      collected_at: collectedAt,
      company_name: companyName,
      company_domain: domain,
      product_name: productName,
      pricing_page_url: pricingPageUrl,
      source_page_url: candidate.sourcePageUrl,
      secondary_source_urls: secondarySourceUrls,
      category,
      tagline,
      plan_name: null,
      plan_order: null,
      pricing_text_raw: null,
      price_amount: null,
      currency: null,
      billing_period: null,
      annual_discount_text: companyAnnualDiscount,
      free_plan_available: companyFreePlan,
      free_trial_available: companyFreeTrial,
      contact_sales_only: companyContactSales,
      seat_limit_text: null,
      usage_limit_text: null,
      key_features: [],
      feature_matrix_rows: [],
      addons_text: companyAddons,
      support_or_onboarding_text: companySupport,
      faq_pricing_signals: companyFaq,
      confidence: companyConfidence,
      notes: companyNotes || "No distinct visible plan tiers were extracted from this page.",
    });
  }

  return {
    tracking_key: companyTrackingKey,
    snapshot_label: snapshotLabel,
    collected_at: collectedAt,
    company_name: companyName,
    company_domain: domain,
    product_name: productName,
    pricing_page_url: pricingPageUrl,
    source_page_url: candidate.sourcePageUrl,
    secondary_source_urls: secondarySourceUrls,
    category,
    tagline,
    free_plan_available: companyFreePlan,
    free_trial_available: companyFreeTrial,
    contact_sales_only: companyContactSales,
    annual_discount_text: companyAnnualDiscount,
    addons_text: companyAddons,
    support_or_onboarding_text: companySupport,
    faq_pricing_signals: companyFaq,
    confidence: companyConfidence,
    notes: companyNotes,
    plans,
  };
}

async function extractCompany(candidate: Candidate, collectedAt: string): Promise<Company | null> {
  try {
    const doc = await firecrawl.scrape(candidate.url, {
      formats: [
        {
          type: "json",
          schema: pageSchema,
          prompt: extractionPrompt(candidate.url),
        },
      ],
      integration: "prometheus",
    });
    return normalizeCompany(doc?.json, candidate, collectedAt);
  } catch (err) {
    console.error(`Pricing extraction failed for ${candidate.url}: ${err}`);
    return null;
  }
}

async function main() {
  const collectedAt = new Date().toISOString();
  const candidates = await discoverCandidates();
  const companies: Company[] = [];
  const seenDomains = new Set<string>();

  for (const candidate of candidates) {
    if (companies.length >= maxCompanies) break;
    const domain = baseDomainFromUrl(candidate.url);
    if (!domain || seenDomains.has(domain)) continue;
    const company = await extractCompany(candidate, collectedAt);
    if (!company) continue;
    seenDomains.add(domain);
    companies.push(company);
  }

  if (companies.length === 0) {
    throw new Error("no official SaaS pricing pages could be extracted for the provided inputs");
  }

  const common = {
    snapshot_label: snapshotLabel,
    collected_at: collectedAt,
    query: query || null,
    seed_urls: seedUrlsInput
      ? seedUrlsInput
          .split(",")
          .map((item) => normalizeUrl(item))
          .filter(Boolean)
      : [],
    region,
    language,
    billing_preference: billingPreference,
    sort_hint: sortHint,
    max_companies: maxCompanies,
    output_mode: outputMode,
  };

  const out =
    outputMode === "grouped_by_company"
      ? {
          ...common,
          grouped_by_company: Object.fromEntries(companies.map((company) => [company.company_domain, company])),
        }
      : {
          ...common,
          company_rows: companies,
        };

  process.stdout.write(JSON.stringify(out));
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
deploy to unlock

Deploy this collector to unlock schedules, the API endpoint, and destinations.

One person builds it. Everyone keeps it fresh.