MCP server that extracts clean, structured content from web pages with anti-bot bypass capabilities.

self-hostedofficialservice

Package Details

Transportstdio
Runtimenpx

Environment Variables

FIRECRAWL_API_KEY
Secret

API key for Firecrawl service to bypass anti-bot measures

BRIGHTDATA_API_KEY
Secret

Bearer token for BrightData Web Unlocker service

STRATEGY_CONFIG_PATH

Path to markdown file containing scraping strategy configuration

Default:/tmp/pulse-fetch/strategy.md
OPTIMIZE_FOR

Optimization strategy for scraping: cost or speed

Default:cost
MCP_RESOURCE_STORAGE

Storage backend for saved resources: memory or filesystem

Default:memory
MCP_RESOURCE_FILESYSTEM_ROOT

Directory for filesystem storage (only used with filesystem type)

Default:/tmp/pulse-fetch/resources
SKIP_HEALTH_CHECKS

Skip API authentication health checks at startup

Default:false
LLM_PROVIDER

LLM provider for extract feature: anthropic, openai, openai-compatible

LLM_API_KEY
Secret

API key for the chosen LLM provider

LLM_API_BASE_URL

Base URL for OpenAI-compatible providers

LLM_MODEL

Specific model to use for extraction