Express.js Raw HTML Injection via String Formatting

High Risk Cross-Site Scripting (XSS)
expressxsshtml-injectionjavascriptstring-formattingweb-security

What it is

The Express.js application directly injects user-controlled content into HTML responses using string formatting or concatenation, leading to Cross-Site Scripting (XSS) vulnerabilities. This occurs when user input is embedded in HTML without proper encoding or sanitization.

// Vulnerable: Direct HTML injection with user input
app.get('/welcome', (req, res) => {
  const userName = req.query.name;
  
  // Dangerous: user input directly in HTML
  const html = `<h1>Welcome ${userName}!</h1>
    <p>Today is ${new Date()}</p>`;
  
  res.send(html);
});
// Secure: HTML encoding and CSP
const he = require('he'); // HTML entities encoding

app.get('/welcome', (req, res) => {
  const userName = req.query.name || 'Guest';
  
  // Encode user input to prevent XSS
  const safeUserName = he.encode(userName);
  
  const html = `<h1>Welcome ${safeUserName}!</h1>
    <p>Today is ${new Date().toDateString()}</p>`;
  
  // Set CSP header
  res.setHeader('Content-Security-Policy', "default-src 'self'; script-src 'none'");
  res.send(html);
});

💡 Why This Fix Works

The vulnerable code was updated to address the security issue.

Why it happens

Express applications build HTML responses using string concatenation or template literals that embed user input directly: res.send('<h1>Welcome ' + req.query.name + '</h1>') or res.send(`<div>${userInput}</div>`). Developers use convenient string manipulation without understanding XSS implications. Template literals make interpolation syntactically simple encouraging direct embedding of untrusted data. User input containing <script>, event handlers (onload, onerror), or javascript: URLs gets rendered directly into HTML enabling arbitrary script execution. Even seemingly innocuous inputs like search queries or usernames become XSS vectors when rendered without encoding.

Root causes

String Concatenation and Template Literals with User Input

Express applications build HTML responses using string concatenation or template literals that embed user input directly: res.send('<h1>Welcome ' + req.query.name + '</h1>') or res.send(`<div>${userInput}</div>`). Developers use convenient string manipulation without understanding XSS implications. Template literals make interpolation syntactically simple encouraging direct embedding of untrusted data. User input containing <script>, event handlers (onload, onerror), or javascript: URLs gets rendered directly into HTML enabling arbitrary script execution. Even seemingly innocuous inputs like search queries or usernames become XSS vectors when rendered without encoding.

Missing HTML Encoding of User-Controlled Data

Applications fail to HTML-encode user input before embedding in responses, allowing special characters (<, >, ", ', &) to be interpreted as HTML/JavaScript syntax rather than text content. No use of encoding libraries like he, escape-html, or DOMPurify to sanitize output. Developers assume client-side frameworks handle encoding or believe input validation is sufficient protection. Applications encode some contexts (HTML body) but miss others (attributes, JavaScript contexts, URLs). User data rendered in dangerous contexts like <script> tags, event handler attributes, or href="javascript:" without context-appropriate encoding enables bypasses of generic HTML escaping.

Direct Insertion of User Content into HTML Responses

Express endpoints directly embed user-provided content from req.query, req.body, req.params into HTML sent via res.send(), res.end(), or response streaming. Applications render user profiles, comments, messages, search results by concatenating database values (which may contain previously injected XSS) directly into HTML. Error messages include user input: res.send('Error processing: ' + userInput). Redirect pages show user URLs: res.send(`<meta http-equiv="refresh" content="0;url=${req.query.redirect}">`). No separation between data and presentation layers - HTML generation happens inline in route handlers with direct user input interpolation.

Insufficient Output Sanitization Before Rendering

Applications attempt output sanitization but use incomplete or bypassable approaches. String replacement like userInput.replace('<', '&lt;') misses other dangerous characters and encoding scenarios. Blocklist-based filtering trying to remove <script> tags gets bypassed with variations (<ScRiPt>, <img src=x onerror=alert()>). Sanitization applied inconsistently across different output locations. Applications sanitize HTML body content but not attribute values, allowing attribute-based XSS through event handlers or href. No context-aware encoding - same sanitization used for HTML body, JavaScript strings, URL parameters despite different escaping requirements for each context.

Using Unsafe String Formatting with User Input

Applications use string formatting utilities (sprintf-js, string.format, printf-style formatters) with user input that gets embedded in HTML: sprintf('<p>%s</p>', userInput). Format strings themselves may be user-controlled creating format string injection vulnerabilities. Libraries designed for general string formatting don't provide HTML-specific encoding, treating all inputs as literal strings to insert. Developers select formatting libraries for convenience without security considerations. Migration from C/Java printf-style code patterns to JavaScript carries unsafe practices. Format strings concatenate user data directly into HTML output without awareness of web security context.

Fixes

1

Apply Context-Aware HTML Encoding to All User Input

Encode all user input using context-appropriate encoding before embedding in HTML. Use escape-html or he libraries for HTML body context: const escaped = require('escape-html')(userInput). For HTML attributes, use attribute encoding escaping quotes and angle brackets. For JavaScript string contexts, use JavaScript encoding (JSON.stringify() or js-string-escape). For URLs, use encodeURIComponent(). Never insert user data directly into <script> tags, style attributes, or event handlers even with encoding. Create encoding utility functions for each context and use consistently across application. Example: const safe = {html: escapeHtml, attr: escapeAttr, js: escapeJs, url: encodeURIComponent}; use appropriate encoder based on output location.

2

Implement Strict Content Security Policy Headers

Deploy comprehensive CSP headers using helmet.js to prevent XSS execution even if encoding is bypassed: app.use(helmet.contentSecurityPolicy({directives: {'default-src': ["'self'"], 'script-src': ["'self'", "'nonce-{random}'"], 'object-src': ["'none'"], 'base-uri': ["'self'"]}})). Generate unique nonces per request and include in inline script tags when needed. Set CSP report-uri to monitor violations. Use strict-dynamic for modern browsers. Disable unsafe-inline and unsafe-eval completely. CSP provides critical defense-in-depth ensuring injected scripts cannot execute. Start with CSP in report-only mode, analyze violations, then enforce. Combine CSP with other headers: X-Content-Type-Options: nosniff, X-Frame-Options: DENY.

3

Use Template Engines with Automatic Escaping Enabled

Replace manual HTML construction with template engines that provide automatic XSS protection. Use Pug with escaped interpolation (= not !=), Handlebars with {{ }} (not {{{ }}}), or EJS with <%= %> (not <%- %>). Configure template engines with autoescaping enabled: Nunjucks with {autoescape: true}, Pug defaults to escaping. Separate templates from code logic: store templates in .pug, .hbs, .ejs files rather than building HTML in route handlers. Use template partials and layouts for consistent structure. Review all template usage ensuring unescaped output (!=, {{{ }}}, <%- %>) only used for trusted, non-user-controlled content and has clear security justification documented in code reviews.

4

Sanitize User Input with Allowlist-Based Libraries

Use DOMPurify, sanitize-html, or xss libraries to remove dangerous HTML while preserving safe content for rich text features: const clean = DOMPurify.sanitize(userHtml, {ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a'], ALLOWED_ATTR: ['href']}). Configure strict allowlists of permitted HTML tags and attributes. Never use blocklists attempting to filter out dangerous patterns. Sanitize on output (when rendering) rather than input (when receiving) to ensure defense even if data is injected directly into database. For plain text content, prefer encoding over sanitization. Sanitization is for scenarios requiring HTML subset (user comments with formatting), not general text display. Test sanitization with comprehensive XSS payloads.

5

Implement Strict Input Validation and Allowlist Filtering

Validate all user input against expected formats before processing or storing. For structured data, use schema validation (joi, yup, express-validator): check length limits, character allowlists, format patterns. Reject input containing unexpected characters rather than trying to sanitize. For names, allow only [a-zA-Z0-9 .-] characters. For URLs, validate scheme is http/https and domain matches expected patterns. Define maximum lengths for all inputs (e.g., username max 50 chars). Implement both input validation AND output encoding - validation reduces attack surface but encoding is necessary defense-in-depth. Log validation failures to detect attack attempts. Never rely solely on input validation for XSS protection.

6

Use Safe DOM Manipulation APIs Instead of innerHTML

For client-side JavaScript rendering data from API responses, use safe DOM APIs instead of innerHTML which interprets HTML: use textContent for text, setAttribute() for attributes, createElement() for elements. Example: element.textContent = userData (safe) instead of element.innerHTML = userData (unsafe). Use React, Vue, or Angular which provide automatic XSS protection through virtual DOM and template binding. If innerHTML must be used, sanitize with DOMPurify first. For server-side, render complete HTML server-side with proper encoding rather than sending user data to client for DOM manipulation. Separate data from presentation: send JSON API responses and use client framework's safe rendering mechanisms.

Detect This Vulnerability in Your Code

Sourcery automatically identifies express.js raw html injection via string formatting and many other security issues in your codebase.