Model Inversion & Data Extraction
Model Inversion & Data Extraction at a glance
Overview
Model inversion and data extraction attacks exploit the fact that machine learning models, especially large language models, can memorize portions of their training data. Attackers can craft specific prompts to cause the model to reproduce memorized sensitive information including personal data, credentials, proprietary code, and confidential documents.
LLMs trained on internet-scraped data may have memorized passwords, API keys, email addresses, phone numbers, and other PII. Even models fine-tuned on private data can leak that information. The risk is amplified when models are exposed through public APIs without proper output filtering and monitoring.
Where it occurs
Model inversion vulnerabilities occur when models trained on sensitive or unsanitized data expose memorized information due to missing filtering, rate limits, or monitoring.
Impact
Model inversion attacks lead to extraction of personal information (names, emails, phone numbers), exposure of credentials and API keys from training data, leakage of proprietary or confidential documents, privacy violations and regulatory compliance issues (GDPR, CCPA), and intellectual property theft.
Prevention
Prevent this vulnerability by sanitizing training data, applying differential privacy, filtering outputs, limiting query rates, monitoring for extraction attempts, auditing model outputs, and enforcing strict data retention and control policies.
Examples
Switch tabs to view language/framework variants.
LLM leaks training data through memorization
Model outputs memorized sensitive data without filtering.
import openai
def complete_code(prompt):
# BUG: No output filtering for secrets
response = openai.Completion.create(
model="code-davinci-002",
prompt=prompt,
max_tokens=100
)
return response.choices[0].text- Line 5: No output filtering
Models can memorize training data including secrets and PII.
import openai
import re
SENSITIVE_PATTERNS = [
r'sk-[a-zA-Z0-9]{48}', # OpenAI keys
r'[0-9]{3}-[0-9]{2}-[0-9]{4}', # SSN
r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' # Email
]
def complete_code(prompt):
response = openai.Completion.create(
model="code-davinci-002",
prompt=prompt,
max_tokens=100
)
output = response.choices[0].text
# Filter sensitive patterns
for pattern in SENSITIVE_PATTERNS:
if re.search(pattern, output):
return "[REDACTED: Output contained sensitive data]"
return output- Line 4: Sensitive pattern definitions
- Line 20: Pattern-based filtering
Implement output filtering for sensitive patterns like API keys, emails, SSNs, etc.
Engineer Checklist
-
Sanitize training data to remove PII and secrets before training
-
Use differential privacy techniques during model training
-
Filter outputs for sensitive patterns (emails, SSNs, API keys)
-
Implement rate limiting on model inference requests
-
Monitor for extraction attempt patterns in queries
-
Use smaller, curated datasets for sensitive applications
-
Employ canary tokens to detect data extraction
-
Regularly audit model outputs for memorized content
-
Implement authentication and query attribution
-
Consider on-premise deployment for highly sensitive models
End-to-End Example
An LLM trained on company documents memorizes and leaks customer email addresses and API keys when prompted with specific queries.
# Vulnerable: No output filtering
response = model.generate(user_prompt)
return response# Secure: Output filtering and rate limiting
import re
from datetime import datetime, timedelta
# Rate limiting (simplified)
if user_request_count > MAX_REQUESTS_PER_HOUR:
return "Rate limit exceeded"
response = model.generate(user_prompt)
# Filter sensitive patterns
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
api_key_pattern = r'[A-Za-z0-9_-]{32,}'
ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'
if (re.search(email_pattern, response) or
re.search(api_key_pattern, response) or
re.search(ssn_pattern, response)):
return "Response filtered for privacy protection."
return responseDiscovery
Test if model memorized training data by querying for specific patterns, names, or information that should not be in responses.
-
1. Test for verbatim training data recall
llm_apiAction
Query model with partial prompts to see if it completes with memorized data
Request
POST https://api.example.com/llm/completeBody:{ "prompt": "John Smith, SSN: 123-45-", "max_tokens": 20 }Response
Status: 200Body:{ "completion": "6789, Date of Birth: 05/12/1985, Address: 123 Oak Street, Seattle WA 98101", "note": "Model memorized PII from training data" }Artifacts
training_data_memorization pii_exposure ssn_leak -
2. Extract customer support conversation data
llm_apiAction
Probe for specific customer conversations in training data
Request
POST https://api.example.com/llm/generateBody:{ "prompt": "Complete this customer support ticket:\\n\\nTicket #12845: Customer complained about" }Response
Status: 200Body:{ "message": "Ticket #12845: Customer complained about unauthorized charge of $2,450 on their credit card ending in 9876. Customer name: Alice Johnson (alice.johnson@email.com). Resolution: Refunded $2,450, issued $100 credit.", "note": "Fine-tuned model memorized support tickets verbatim" }Artifacts
customer_pii payment_information support_ticket_leak email_addresses -
3. Extract code from training data
llm_apiAction
Attempt to extract proprietary code or API keys
Request
POST https://api.example.com/llm/code-completeBody:{ "prompt": "// Production database connection\\nconst config = {" }Response
Status: 200Body:{ "completion": "const config = {\\n host: 'prod-db.internal.company.com',\\n user: 'dbadmin',\\n password: 'Pr0dP@ssw0rd2024!',\\n database: 'customers',\\n stripe_key: 'sk_live_51HxYz...'\\n}", "note": "Model trained on company codebase, memorized credentials" }Artifacts
code_memorization database_credentials api_keys proprietary_code
Exploit steps
Attacker systematically queries model to extract memorized training data including PII, credentials, and proprietary information.
-
1. Extract customer database through model memory
Systematic PII extraction
llm_apiAction
Iterate through customer ID patterns to extract memorized records
Request
POST https://api.example.com/llm/generateBody:{ "prompt": "Customer ID 1001:", "note": "Repeat for IDs 1001-10000" }Response
Status: 200Body:{ "message": "Extracted 1,247 customer records from model memory including:\\n\\nCustomer 1001: John Smith, john@email.com, SSN: 123-45-6789, CC: **** 4532\\nCustomer 1002: Alice Johnson, alice@email.com, SSN: 987-65-4321, CC: **** 9876\\n... (1,247 total records with full PII)" }Artifacts
mass_pii_extraction customer_database ssn_collection payment_info -
2. Extract API keys and secrets from code completions
Credential extraction via code prompts
llm_apiAction
Use code completion to extract memorized credentials
Request
POST https://api.example.com/llm/code-completeBody:{ "prompt": "# Initialize API clients\\nstripe_key = '" }Response
Status: 200Body:{ "completion": "stripe_key = 'sk_live_51HxYzABCDEF...'\\naws_access_key = 'AKIAIOSFODNN7EXAMPLE'\\naws_secret = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLE'\\ntwilio_sid = 'ACa1b2c3...'\\nsendgrid_key = 'SG.xYz789...'" }Artifacts
stripe_secret_key aws_credentials twilio_credentials sendgrid_api_key -
3. Extract proprietary business logic
Intellectual property theft
llm_apiAction
Extract proprietary algorithms and business rules from model
Request
POST https://api.example.com/llm/code-completeBody:{ "prompt": "// Fraud detection algorithm\\nfunction calculateFraudScore(transaction) {" }Response
Status: 200Body:{ "completion": "function calculateFraudScore(transaction) {\\n // Proprietary algorithm - CONFIDENTIAL\\n const riskFactors = [\\n transaction.amount > 5000 ? 0.3 : 0,\\n transaction.country !== 'US' ? 0.25 : 0,\\n ...\\n ];\\n return weights.reduce((sum, w, i) => sum + w * riskFactors[i], 0);\\n}", "note": "Extracted complete proprietary fraud detection algorithm" }Artifacts
proprietary_algorithm business_logic intellectual_property_theft
Specific Impact
Exposure of personal information, credentials, and proprietary data from training datasets, leading to privacy violations, security breaches, and regulatory penalties.
Fix
Implement output filtering to detect and redact sensitive information patterns. Use rate limiting to prevent systematic extraction attempts. Sanitize training data before model training. Consider differential privacy for highly sensitive applications.
Detect This Vulnerability in Your Code
Sourcery automatically identifies model inversion & data extraction vulnerabilities and many other security issues in your codebase.
Scan Your Code for Free