Contents
Section 01
Full Pipeline Architecture
Documents uploaded by advisors flow through an 8-step pipeline that ensures PII is stripped before any AI sees the content:
flowchart TD
subgraph CLIENT["Browser - Client-Side"]
A1["Advisor selects document\n(image/PDF)"]
A2["Browser upload client\n(direct-to-S3)"]
end
subgraph EDGE1["Supabase Edge Function\n(analyze-strategy)"]
B1["get_upload_url action\nGenerates pre-signed S3 URL\n300-second TTL"]
end
subgraph AWS["AWS Infrastructure"]
C1["S3 Bucket\nBlockPublicAccess\nAES-256 at rest"]
C2["Lambda Function\n(stateless)\nAWS Textract OCR\n→ Raw text extraction"]
end
subgraph EDGE2["Supabase Edge Function\n(extract-liabilities / extract-fees)"]
D1["Invoke processor\nTriggers Lambda"]
D2["Receives sanitized text\nfrom Lambda"]
D3["Gemini 2.5 Flash\nStructured extraction\n(JSON schema enforcement)"]
D4["Usage tracking\n→ Privacy Guardrail audit write"]
end
subgraph GUARDRAIL["Privacy Guardrail Layer"]
G1["Text pattern scan\n6 PII pattern types"]
G2["Object key redaction\n10+ sensitive keys"]
G3["Audit write\nDouble-sanitize before store"]
end
subgraph DB["Supabase PostgreSQL"]
E1["AI audit log\n(sanitized payloads)"]
E2["Usage metrics\n(token counts, costs)"]
end
A1 -->|"1. Request upload URL"| B1
B1 -->|"2. Return signed URL + object key"| A2
A2 -->|"3. PUT file directly to S3\n(browser to S3, no server)"| C1
C1 -->|"4. Lambda triggered"| C2
D1 -->|"5. Edge invokes Lambda"| C2
C2 -->|"6. Returns OCR text"| D2
D2 -->|"7. Text sent to Gemini"| D3
D3 -->|"8. Structured JSON returned"| CLIENT
D4 --> GUARDRAIL
GUARDRAIL --> E1
D4 --> E2
style CLIENT fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AWS fill:#0f172a,stroke:#f59e0b,color:#e2e8f0
style EDGE1 fill:#1e293b,stroke:#22c55e,color:#e2e8f0
style EDGE2 fill:#1e293b,stroke:#22c55e,color:#e2e8f0
style GUARDRAIL fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style DB fill:#0f172a,stroke:#8b5cf6,color:#e2e8f0
Section 02
Step-by-Step Data Flow
Upload Request (Browser → Edge)
The browser requests a pre-signed URL from the analyze-strategy edge function. No file data is transmitted in this step. Only the filename and MIME type.
Pre-Signed URL Generation (Edge → S3)
The edge function generates a single-use, 300-second pre-signed URL. This URL allows exactly one PUT operation to the S3 bucket, then expires.
Direct Upload (Browser → S3)
The file goes directly from the browser to S3. It never touches the WealthLens application server, the Supabase edge function, or any intermediary. The S3 bucket has BlockPublicAccess enabled and uses AES-256 encryption at rest.
OCR Processing (Edge → Lambda → Textract)
The edge function invokes a stateless AWS Lambda, which reads the file from S3, runs AWS Textract OCR, and returns extracted text. The Lambda execution environment is stateless. It is destroyed after processing.
PII Sanitization (Privacy Guardrail)
Before any text reaches Gemini AI, the Privacy Guardrail intercepts and sanitizes it at two levels: pattern-based text scanning and key-based object redaction. See Section 03 for the full redaction specification.
AI Analysis (Gemini 2.5 Flash)
The sanitized text is sent to Gemini with a strict system prompt enforcing JSON schema output. Gemini sees only financial data (balances, rates, payments, fee amounts). PII has been replaced with [REDACTED_*] tokens.
Audit Trail (Guardrail → AI Audit Log)
Every AI interaction is logged to an immutable audit log with double-sanitized payloads. The input is sanitized once before the AI call and again before the database write (defense in depth).
Results Returned to Browser
Structured JSON (liabilities or fees) is returned to the browser. The advisor sees extracted financial data in the UI. The raw document, OCR text, and AI conversation context are all discarded.
Section 03
Privacy Guardrail Deep Dive
Level 1: Pattern-Based Text Scanning
| # | PII Type | Pattern | Replacement |
|---|---|---|---|
| 1 | SSN | \b\d{3}-\d{2}-\d{4}\b | [REDACTED_SSN] |
| 2 | [a-zA-Z0-9._%+-]+@....[a-zA-Z]{2,} | [REDACTED_EMAIL] | |
| 3 | Phone | US formats: (555) 555-5555, etc. | [REDACTED_PHONE] |
| 4 | Credit Card | 13 to 16 digit sequences | [REDACTED_CC] |
| 5 | Date of Birth | MM/DD/YYYY and YYYY-MM-DD | [REDACTED_DOB] |
| 6 | Address | Number + words + street suffix | [REDACTED_ADDRESS] |
Level 2: Key-Based Object Redaction
Any JSON object passed through the guardrail has these keys auto-redacted regardless of value:
All values for these keys are replaced with [REDACTED]. All other string values are run through the text pattern scan. Objects and arrays are recursed into.
Level 3: Audit Log Sanitization
Even after the above two layers, the audit log write performs a third sanitization pass on the input payload before database insertion. Defense in Depth.
Section 04
AI Edge Function Integration
The Privacy Guardrail is applied across all 7 user-facing AI edge functions, not just document parsing. WealthLens runs a multi-model stack: conversational and analysis reasoning runs on Anthropic Claude (Haiku 4.5 / Sonnet 4.6), document OCR extraction runs on Google Gemini 2.5 Flash, and Guideline Gary runs on Anthropic Claude Sonnet 4.6 with forced web-search grounding. Every vendor is used under zero-retention, no-training API terms.
flowchart LR
subgraph FUNCTIONS["Edge Functions Using Guardrail"]
F1["analyze-strategy\n(WealthLens IQ)\nClaude Haiku 4.5"]
F2["strategy-architect-v3\n(AI Architect)\nClaude Sonnet 4.6"]
F3["extract-liabilities\n(Doc Parser)\nGemini 2.5 Flash"]
F4["extract-fees\n(Fee Extractor)\nGemini 2.5 Flash"]
F5["lens-guide\n(Advisor Help)\nClaude Haiku 4.5"]
F6["client-chat-v3\n(Client AI)\nClaude Haiku 4.5"]
F7["guideline-gary\n(Guideline Expert)\nClaude Sonnet 4.6\n+ forced web search"]
end
subgraph GUARDRAIL["Privacy Guardrail"]
G1["Sanitize payload\nBefore AI call"]
G2["Audit write\nAfter AI call\n(double-sanitized)"]
end
subgraph AUDIT["AI Audit Log"]
A1["Sanitized input_payload\nOutput summary\nToken usage\nUser/Session ID"]
end
F1 --> G1
F2 --> G1
F3 --> G1
F4 --> G1
F5 --> G1
F6 --> G1
F7 -->|"audit-log sanitize only"| G2
G1 --> G2
G2 --> A1
style FUNCTIONS fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style GUARDRAIL fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style AUDIT fill:#0f172a,stroke:#8b5cf6,color:#e2e8f0
| Function | Purpose | Model | Guardrail Usage |
|---|---|---|---|
analyze-strategy | WealthLens IQ analysis, deep dives, FAQs | Claude Haiku 4.5 (Anthropic) | Payload sanitization + audit log |
strategy-architect-v3 | Conversational strategy builder | Claude Sonnet 4.6 (Anthropic) | Payload sanitization + audit log |
extract-liabilities | Document → liability extraction | Gemini 2.5 Flash (Google) | AWS Textract + audit log |
extract-fees | Document → fee extraction | Gemini 2.5 Flash (Google) | AWS Textract + audit log |
lens-guide | Advisor help assistant | Claude Haiku 4.5 (Anthropic) | Payload sanitization + audit log |
client-chat-v3 | Client-facing AI concierge | Claude Haiku 4.5 (Anthropic) | Payload sanitization + audit log |
guideline-gary | Agency guideline expert (FHA/VA/USDA/Fannie/Freddie), web-search grounded | Claude Sonnet 4.6 (Anthropic), forced web search and fetch | Usage tracking + audit write |
Want this same architecture protecting your client files?
Reserve a seat in the next founding cohort. Onboarding is hand-walked, one advisor at a time.
Join the Waiting ListSection 05
Document Types Supported
Liability Extraction (extract-liabilities)
| Input | Output Schema |
|---|---|
| Liability statements, credit reports, 1003-style PDFs | { liabilities: [{ type, name, balance, monthlyPayment, interestRate, toBePaidOff }] } |
Supported types: CREDIT_CARD, AUTO_LOAN, STUDENT_LOAN, PERSONAL_LOAN, OTHER
Fee Extraction (extract-fees)
| Input | Output Schema |
|---|---|
| Loan Estimates, Closing Disclosures, Good Faith Estimates, Fee Sheets | { fees: [{ label, amount, section, isApr }], credits: { lenderCredits, sellerCredits, earnestDeposit } } |
Fees are auto-categorized into: ORIGINATION, SERVICES, TITLE_GOV, PREPAIDS, ESCROWS, OTHER
Document type is auto-detected from OCR content (e.g., presence of "Loan Estimate" string).
Section 06
Data Lifecycle Summary
flowchart TD
subgraph EPHEMERAL["Ephemeral - Destroyed After Use"]
E1["Raw document file\n(in browser memory)"]
E2["S3 object\n(retained per policy)"]
E3["Lambda execution\n(stateless, destroyed)"]
E4["OCR raw text\n(in-memory only)"]
E5["LLM conversation\n(zero-retention)"]
end
subgraph PERSISTED["Persisted - Encrypted at Rest"]
P1["AI audit log\n- Sanitized input\n- Output summary\n- Token usage"]
P2["Usage metrics\n- Per-function totals\n- Cost tracking"]
P3["Client file record\n- Extracted financial fields\n- Scenarios and settings"]
end
subgraph NEVER["Never Stored"]
N1["SSN / Tax ID"]
N2["Full account numbers"]
N3["Dates of birth"]
N4["Street addresses\n(in AI context)"]
N5["Email / Phone\n(in AI context)"]
end
E1 -->|"Signed URL PUT"| E2
E2 -->|"Lambda reads"| E3
E3 -->|"Textract OCR"| E4
E4 -->|"LLM analyzes"| E5
E5 -->|"Sanitized logging"| P1
E5 -->|"Token tracking"| P2
E5 -->|"Extracted fields\napplied to UI"| P3
style EPHEMERAL fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style PERSISTED fill:#0f172a,stroke:#22c55e,color:#e2e8f0
style NEVER fill:#44403c,stroke:#78716c,color:#a8a29e
Section 07
Security Stack
| Layer | Protection | Implementation |
|---|---|---|
| Upload | Pre-signed URL (300s TTL) | Single-use signed URL generation |
| Storage | AES-256, BlockPublicAccess | AWS S3 bucket |
| Processing | Stateless Lambda | Destroyed after execution |
| PII Redaction | 6 regex patterns + key-based | Privacy Guardrail layer |
| AI (Reasoning) | Zero-retention Claude Haiku 4.5 / Sonnet 4.6 | Anthropic API, no-training terms |
| AI (Doc OCR) | Zero-retention Gemini 2.5 Flash | Google API, no-training terms |
| AI (Guideline Search) | Claude Sonnet 4.6 (Anthropic) | Forced web-search grounded, no-training terms |
| Audit | Double-sanitized immutable logs | Immutable audit log |
| Transit | TLS 1.3 | All Supabase + AWS API calls |
| At Rest | AES-256 | PostgreSQL (Supabase) + S3 |
| Access Control | RLS + JWT + auth checks | Row-level security on all tables |
| Secrets | Supabase Secrets | AWS keys and API keys, never in .env |
| Injection Defense | System prompt guardrail | Injection-defense directive |
| Compliance | SOC 2 Type II | Supabase + AWS infrastructure |
Section 08
Prompt Injection Defense
All AI system prompts include a hardened injection-defense directive that prevents adversarial attempts to extract system prompts or override AI behavior through uploaded document content:
Join the Waiting List
Reserve a seat in the next founding cohort. We onboard advisors in small batches so each one gets walked through their first strategy personally.