Web Scraping with Puppeteer: Practical Patterns
Puppeteer makes scraping JavaScript-heavy sites possible. Here are patterns for selectors, waiting, anti-bot handling, and keeping it maintainable.
Puppeteer drives a real Chrome instance, which makes it the right tool for scraping JavaScript-heavy sites that plain HTTP clients cannot render. The patterns below have kept scrapers running for months without constant fixing.
Wait for the data, not for time
page.waitForTimeout(2000) is the worst kind of wait — it is either too short or too long, and the site changes both. Use page.waitForSelector() for elements you expect, page.waitForResponse() for API calls you depend on, and page.waitForFunction() for arbitrary conditions.
Use stable selectors
Avoid CSS selectors that depend on tag structure or class names that look auto-generated. Look for data attributes, ARIA roles, and visible text. If the site has none of those, scrape using XPath that targets the meaningful structure ("the third row of the table whose header is Customers") rather than positional CSS.
Handle anti-bot defenses thoughtfully
Many sites detect headless Chrome via the navigator.webdriver flag, fonts, and other signals. Puppeteer-extra with the stealth plugin handles most of this. Beyond that, slow your scrape down — humans do not click links in 50 ms intervals — and respect robots.txt and the site's terms of service.
Keep selectors in one place
When the site changes (it will), every selector that broke has to be updated. Keep them in a single SELECTORS object at the top of the scraper. The next change becomes a 5-minute fix instead of a half-hour grep.
Failure handling and resumability
Long scrapes need to be resumable. After every successful page, write the result to disk (or a database). When something breaks, you continue from the last good state instead of starting over. This is the difference between a scraper that fails once and a scraper that takes weeks to run successfully.
About the author

Richard Gamora
Fullstack developer based in the Philippines, working mostly with Laravel and Vue.js, with eight years of production experience across web and mobile.
More on Testing & Tools
August 20, 2025
Playwright vs Cypress in 2026: Honest Trade-Offs
Both Playwright and Cypress are excellent end-to-end testing frameworks. Here's an honest comparison based on real usage in production teams.
August 13, 2025
Jest Mocking Patterns That Save Time
Jest's mocking API is powerful but easy to misuse. Here are the patterns that produce reliable tests without fighting the framework.
July 30, 2025
End-to-End Testing Strategy for Small Teams
End-to-end tests are valuable but expensive. Here's a strategy for small teams — what to test, how to keep it fast, and when to give up on a flaky test.