URLscan.io FETCHER
v1PublishedFetch 100 of the freshest URLs from the urlscan.io public api!
›Author's sample data
| urls | |
|---|---|
| count | 100 |
| source | urlscan.io public scan feed |
| fetchedAt | 2026-06-11T23:21:09.752Z |
›Publisher
1 subscriberEvery day at 1:30 AM0 runs in 14d · published 4h ago
›Versions
managed by authorv1builtapprovedcurrent4h ago
Schedulesdeploy to enable
Run this collector on a cadence — daily, hourly, your call.
API endpointdeploy to unlock
POST to run it on demand and get fresh data in the response.
Activitydeploy to track
0 subscriber runs in the last 14 days.
How this script collects data
import Firecrawl from "@mendable/firecrawl-js";
import * as cheerio from "cheerio";
const apiKey = process.env.FIRECRAWL_API_KEY;
if (!apiKey) {
console.error("FIRECRAWL_API_KEY is not set");
process.exit(1);
}
const firecrawl = new Firecrawl({ apiKey });
const SEARCH_URL = "https://urlscan.io/api/v1/search/?q=*&size=100";
interface UrlscanResult {
task?: { url?: string; domain?: string; time?: string };
page?: { url?: string; domain?: string };
}
function parseJsonBody(rawHtml: string): { results?: UrlscanResult[] } {
// The endpoint returns raw JSON; Firecrawl may wrap it in an HTML shell.
const direct = rawHtml.trim();
if (direct.startsWith("{")) {
return JSON.parse(direct);
}
const $ = cheerio.load(rawHtml);
const text = $("pre").text().trim() || $("body").text().trim();
const start = text.indexOf("{");
if (start === -1) {
throw new Error("no JSON object found in urlscan.io search response");
}
return JSON.parse(text.slice(start));
}
async function main() {
console.error(`Fetching latest public scans from ${SEARCH_URL}`);
const doc = await firecrawl.scrape(SEARCH_URL, {
formats: ["rawHtml"],
integration: "prometheus",
});
const rawHtml = doc.rawHtml;
if (!rawHtml) {
throw new Error("urlscan.io search response had no content");
}
const data = parseJsonBody(rawHtml);
if (!Array.isArray(data.results)) {
throw new Error("urlscan.io response is missing the 'results' array");
}
const urls = data.results
.map((r) => ({
url: r.task?.url ?? r.page?.url ?? "",
domain: r.task?.domain ?? r.page?.domain ?? "",
date: r.task?.time ?? "",
}))
.filter((r) => r.url !== "");
if (urls.length === 0) {
throw new Error("no scan results extracted from urlscan.io response");
}
console.error(`Extracted ${urls.length} newly scanned URLs`);
const out = {
source: "urlscan.io public scan feed",
fetchedAt: new Date().toISOString(),
count: urls.length,
urls,
};
process.stdout.write(JSON.stringify(out));
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
Build prompt
URLscan.io FETCHER