MCP server that extracts clean, structured content from web pages with anti-bot bypass capabilities.

stdioofficialservice

Package Details

Transportstdio
Runtimenpx

Environment Variables

FIRECRAWL_API_KEY
Secret

API key for Firecrawl service to bypass anti-bot measures

BRIGHTDATA_API_KEY
Secret

Bearer token for BrightData Web Unlocker service

STRATEGY_CONFIG_PATH
Default:/tmp/pulse-fetch/strategy.md

Path to markdown file containing scraping strategy configuration

OPTIMIZE_FOR
Default:cost

Optimization strategy for scraping: cost or speed

MCP_RESOURCE_STORAGE
Default:memory

Storage backend for saved resources: memory or filesystem

MCP_RESOURCE_FILESYSTEM_ROOT
Default:/tmp/pulse-fetch/resources

Directory for filesystem storage (only used with filesystem type)

SKIP_HEALTH_CHECKS(bool)
Default:false

Skip API authentication health checks at startup

LLM_PROVIDER

LLM provider for extract feature: anthropic, openai, openai-compatible

LLM_API_KEY
Secret

API key for the chosen LLM provider

LLM_API_BASE_URL

Base URL for OpenAI-compatible providers

LLM_MODEL

Specific model to use for extraction