ConnectOnionConnectOnion
DocsWebFetch

WebFetch

Give your agents web scraping powers. Fetch, parse, and analyze web pages.

Quick Start

quickstart.py
1from connectonion import Agent, WebFetch 2 3web = WebFetch() 4agent = Agent("researcher", tools=[web]) 5 6agent.input("What does stripe.com do?") 7agent.input("Get contact info from acme.com")

Low-Level Methods

Direct HTTP and parsing operations

fetch(url)

HTTP GET request, returns raw HTML

fetch.py
1html = web.fetch("https://example.com") 2# Returns raw HTML string

strip_tags(html, max_chars=10000)

Strip HTML tags, returns body text only

strip_tags.py
1html = web.fetch("https://example.com") 2text = web.strip_tags(html) 3# Returns clean plain text (body content only)

get_title(html)

Get page title from HTML

get_links(html)

Extract all links from HTML

get_emails(html)

Extract email addresses from HTML

get_social_links(html)

Extract social media links (Twitter, LinkedIn, Facebook, etc.)

High-Level Methods (LLM-Powered)

AI-powered analysis of web pages

analyze_page(url)

Use LLM to understand what a page/company does

analyze_page.py
1result = web.analyze_page("https://stripe.com") 2# Returns: "Stripe is a payment processing platform that..."}

get_contact_info(url)

Extract contact information (email, phone, address) using LLM

get_contact_info.py
1info = web.get_contact_info("https://acme.com") 2# Returns: { 3# "email": "contact@acme.com", 4# "phone": "+1-555-0123", 5# "address": "123 Main St, City" 6# }

Research Agent Example

research_agent.py
1from connectonion import Agent, WebFetch, Memory 2 3web = WebFetch() 4memory = Memory() 5 6agent = Agent( 7 name="researcher", 8 tools=[web, memory], 9 system_prompt="""You are a web researcher. You can: 10 - Fetch and analyze websites 11 - Extract contact information 12 - Find social media profiles 13 - Remember findings for later""" 14) 15 16# Research a company 17agent.input("Research stripe.com and tell me what they do") 18 19# Find contact info 20agent.input("Get contact information from acme.com") 21 22# Build a lead list 23agent.input("Find all email addresses on techstartup.io and save them to memory") 24 25# Competitive analysis 26agent.input("Compare what stripe.com and square.com offer")

API Reference

MethodTypeDescription
fetch(url)Low-levelHTTP GET, returns raw HTML
strip_tags(html)Low-levelRemove HTML tags, return text
get_title(html)Low-levelExtract page title
get_links(html)Low-levelExtract all links
get_emails(html)Low-levelExtract email addresses
get_social_links(html)Low-levelExtract social media links
analyze_page(url)LLMAI analysis of what page/company does
get_contact_info(url)LLMAI extraction of contact info

Configuration

config.py
1# Custom timeout (default: 15 seconds) 2web = WebFetch(timeout=30) 3 4# Use with agent 5agent = Agent("researcher", tools=[web])