Web Scraping with Puppeteer: Practical Patterns

Puppeteer makes scraping JavaScript-heavy sites possible. Here are patterns for selectors, waiting, anti-bot handling, and keeping it maintainable.

Richard GamoraRichard GamoraFullstack developer·4 min read
PuppeteerWeb ScrapingAutomation

Puppeteer drives a real Chrome instance, which makes it the right tool for scraping JavaScript-heavy sites that plain HTTP clients cannot render. The patterns below have kept scrapers running for months without constant fixing.

Wait for the data, not for time

page.waitForTimeout(2000) is the worst kind of wait — it is either too short or too long, and the site changes both. Use page.waitForSelector() for elements you expect, page.waitForResponse() for API calls you depend on, and page.waitForFunction() for arbitrary conditions.

Use stable selectors

Avoid CSS selectors that depend on tag structure or class names that look auto-generated. Look for data attributes, ARIA roles, and visible text. If the site has none of those, scrape using XPath that targets the meaningful structure ("the third row of the table whose header is Customers") rather than positional CSS.

Handle anti-bot defenses thoughtfully

Many sites detect headless Chrome via the navigator.webdriver flag, fonts, and other signals. Puppeteer-extra with the stealth plugin handles most of this. Beyond that, slow your scrape down — humans do not click links in 50 ms intervals — and respect robots.txt and the site's terms of service.

Keep selectors in one place

When the site changes (it will), every selector that broke has to be updated. Keep them in a single SELECTORS object at the top of the scraper. The next change becomes a 5-minute fix instead of a half-hour grep.

Failure handling and resumability

Long scrapes need to be resumable. After every successful page, write the result to disk (or a database). When something breaks, you continue from the last good state instead of starting over. This is the difference between a scraper that fails once and a scraper that takes weeks to run successfully.

About the author

Richard Gamora

Richard Gamora

Fullstack developer based in the Philippines, working mostly with Laravel and Vue.js, with eight years of production experience across web and mobile.

me@richardgamora.comUpwork ↗

More on Testing & Tools