Async-native, fully typed, built for evasion and performance.
Documentation · Getting Started · Features · Support
Pydoll automates Chromium-based browsers (Chrome, Edge) by connecting directly to the Chrome DevTools Protocol over WebSocket. No WebDriver binary, no navigator.webdriver flag, no compatibility issues.
It combines a high-level API for common tasks with low-level CDP access for fine-grained control over network, fingerprinting, and browser behavior. The entire codebase is async-native and fully type-checked with mypy.
Pydoll is proudly sponsored by Thordata: a residential proxy network built for serious web scraping and automation. With 190+ real residential and ISP locations, fully encrypted connections, and infrastructure optimized for high-performance workflows, Thordata is an excellent choice for scaling your Pydoll automations.
Sign up through our link to support the project and get 1GB free to get started.
Pydoll excels at behavioral evasion, but it doesn't solve captchas. That's where CapSolver comes in. An AI-powered service that handles reCAPTCHA, Cloudflare challenges, and more, seamlessly integrating with your automation workflows.
Register with our invite code and use code PYDOLL to get an extra 6% balance bonus.
- Stealth-first: Human-like mouse movement, realistic typing, and granular browser preference control for fingerprint management.
- Async and typed: Built on
asynciofrom the ground up, 100% type-checked withmypy. Full IDE autocompletion and static error checking. - Network control: Intercept requests to block ads/trackers, monitor traffic for API discovery, and make authenticated HTTP requests that inherit the browser session.
- Shadow DOM and iframes: Full support for shadow roots (including closed) and cross-origin iframes. Discover, query, and interact with elements inside them using the same API.
- Ergonomic API:
tab.find()for most cases,tab.query()for complex CSS/XPath selectors.
pip install pydoll-pythonNo WebDriver binaries or external dependencies required.
HAR Network Recording
Record network activity during a browser session and export as HAR 1.2. Replay recorded requests to reproduce exact API sequences.
from pydoll.browser.chromium import Chrome
async with Chrome() as browser:
tab = await browser.start()
async with tab.request.record() as capture:
await tab.go_to('https://example.com')
capture.save('flow.har')
print(f'Captured {len(capture.entries)} requests')
responses = await tab.request.replay('flow.har')Filter by resource type:
from pydoll.protocol.network.types import ResourceType
async with tab.request.record(
resource_types=[ResourceType.FETCH, ResourceType.XHR]
) as capture:
await tab.go_to('https://example.com')Page Bundles
Save the current page and all its assets (CSS, JS, images, fonts) as a .zip bundle for offline viewing. Optionally inline everything into a single HTML file.
await tab.save_bundle('page.zip')
await tab.save_bundle('page-inline.zip', inline_assets=True)Shadow DOM Support
Full Shadow DOM support, including closed shadow roots. Because Pydoll operates at the CDP level (below JavaScript), the closed mode restriction doesn't apply.
shadow = await element.get_shadow_root()
button = await shadow.query('.internal-btn')
await button.click()
# Discover all shadow roots on the page
shadow_roots = await tab.find_shadow_roots()
for sr in shadow_roots:
checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)
if checkbox:
await checkbox.click()Highlights:
- Closed shadow roots work without workarounds
find_shadow_roots()discovers every shadow root on the pagetimeoutparameter for polling until shadow roots appeardeep=Truetraverses cross-origin iframes (OOPIFs)- Standard
find(),query(),click()API inside shadow roots
# Cloudflare Turnstile inside a cross-origin iframe
shadow_roots = await tab.find_shadow_roots(deep=True, timeout=10)
for sr in shadow_roots:
checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)
if checkbox:
await checkbox.click()Humanized Mouse Movement
Mouse operations produce human-like cursor movement by default:
- Bezier curve paths with asymmetric control points
- Fitts's Law timing: duration scales with distance
- Minimum-jerk velocity: bell-shaped speed profile
- Physiological tremor: Gaussian noise scaled with velocity
- Overshoot correction: ~70% chance on fast movements, then corrects back
await tab.mouse.move(500, 300)
await tab.mouse.click(500, 300)
await tab.mouse.drag(100, 200, 500, 400)
button = await tab.find(id='submit')
await button.click()
# Opt out when speed matters
await tab.mouse.click(500, 300, humanize=False)import asyncio
from pydoll.browser import Chrome
from pydoll.constants import Key
async def google_search(query: str):
async with Chrome() as browser:
tab = await browser.start()
await tab.go_to('https://www.google.com')
search_box = await tab.find(tag_name='textarea', name='q')
await search_box.insert_text(query)
await tab.keyboard.press(Key.ENTER)
first_result = await tab.find(
tag_name='h3',
text='autoscrape-labs/pydoll',
timeout=10,
)
await first_result.click()
await tab.find(id='repository-container-header', timeout=10)
print(f"Page loaded: {await tab.title}")
asyncio.run(google_search('pydoll site:github.com'))Hybrid Automation (UI + API)
Use UI automation to pass login flows (CAPTCHAs, JS challenges), then switch to tab.request for fast API calls that inherit the full browser session: cookies, headers, and all.
# Log in via UI
await tab.go_to('https://my-site.com/login')
await (await tab.find(id='username')).type_text('user')
await (await tab.find(id='password')).type_text('pass123')
await (await tab.find(id='login-btn')).click()
# Make authenticated API calls using the browser session
response = await tab.request.get('https://my-site.com/api/user/profile')
user_data = response.json()Network Interception and Monitoring
Monitor traffic for API discovery or intercept requests to block ads, trackers, and unnecessary resources.
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.fetch.events import FetchEvent, RequestPausedEvent
from pydoll.protocol.network.types import ErrorReason
async def block_images():
async with Chrome() as browser:
tab = await browser.start()
async def block_resource(event: RequestPausedEvent):
request_id = event['params']['requestId']
resource_type = event['params']['resourceType']
if resource_type in ['Image', 'Stylesheet']:
await tab.fail_request(request_id, ErrorReason.BLOCKED_BY_CLIENT)
else:
await tab.continue_request(request_id)
await tab.enable_fetch_events()
await tab.on(FetchEvent.REQUEST_PAUSED, block_resource)
await tab.go_to('https://example.com')
await asyncio.sleep(3)
await tab.disable_fetch_events()
asyncio.run(block_images())Browser Fingerprint Control
Granular control over browser preferences: hundreds of internal Chrome settings for building consistent fingerprints.
options = ChromiumOptions()
options.browser_preferences = {
'profile': {
'default_content_setting_values': {
'notifications': 2,
'geolocation': 2,
},
'password_manager_enabled': False
},
'intl': {
'accept_languages': 'en-US,en',
},
'browser': {
'check_default_browser': False,
}
}Concurrency, Contexts and Remote Connections
Manage multiple tabs and browser contexts (isolated sessions) concurrently. Connect to browsers running in Docker or remote servers.
async def scrape_page(url, tab):
await tab.go_to(url)
return await tab.title
async def concurrent_scraping():
async with Chrome() as browser:
tab_google = await browser.start()
tab_ddg = await browser.new_tab()
results = await asyncio.gather(
scrape_page('https://google.com/', tab_google),
scrape_page('https://duckduckgo.com/', tab_ddg)
)
print(results)Retry Decorator
The @retry decorator supports custom recovery logic between attempts (e.g., refreshing the page, rotating proxies) and exponential backoff.
from pydoll.decorators import retry
from pydoll.exceptions import ElementNotFound, NetworkError
@retry(
max_retries=3,
exceptions=[ElementNotFound, NetworkError],
on_retry=my_recovery_function,
exponential_backoff=True
)
async def scrape_product(self, url: str):
# scraping logic
...Contributions are welcome. See CONTRIBUTING.md for guidelines.
If you find Pydoll useful, consider sponsoring the project on GitHub.
