System Overview
Visage is a real-time face recognition system built on three core pillars: detection, embedding, and matching.
The Recognition Pipeline
Core Components
1. Face Detection
The first step in the pipeline is detecting faces in the video frame.
Detection Models
- OpenCV Haar Cascades (default)
- SSD (Single Shot Detector)
- MTCNN (Multi-task CNN)
- RetinaFace
Performance
- 30-60 FPS on modern CPUs
- Handles multiple faces per frame
- Automatic quality filtering
What it does: Locates faces in the image and extracts bounding box coordinates.
Output: Face region (x, y, width, height)
2. Face Embeddings
Once a face is detected, we generate a mathematical representation called an embedding.
A fixed-length vector (typically 128 or 512 dimensions) that captures the unique features of a face. Similar faces have embeddings that are close together in vector space.
Models Available:
- Facenet (128-dim) - Fast, accurate, default choice
- VGG-Face (2622-dim) - Very accurate, slower
- ArcFace (512-dim) - State-of-the-art accuracy
- OpenFace (128-dim) - Lightweight alternative
Embedding Properties:
- L2 normalized (unit length)
- Invariant to lighting, pose (within limits)
- Consistent across different images of same person
3. Similarity Matching
Embeddings are compared using cosine similarity to find matches in the memory bank.
Cosine Similarity Formula:
similarity(A, B) = (A · B) / (||A|| × ||B||)
Since embeddings are L2-normalized, this simplifies to:
similarity(A, B) = A · B (dot product)
Similarity Range: 0.0 (completely different) to 1.0 (identical)
Default Threshold: 0.6
4. Memory Bank
A local SQLite database stores person metadata and embeddings.
Database Schema:
CREATE TABLE persons (
person_id TEXT PRIMARY KEY,
name TEXT NOT NULL,
relationship TEXT,
metadata TEXT,
created_at TIMESTAMP
);
CREATE TABLE embeddings (
embedding_id TEXT PRIMARY KEY,
person_id TEXT NOT NULL,
embedding BLOB NOT NULL,
embedding_dim INTEGER,
created_at TIMESTAMP,
FOREIGN KEY (person_id) REFERENCES persons(person_id)
);
Data Flow
Registration Flow
- Capture - User captures a photo via webcam
- Detect - System detects face in image
- Embed - Generate embedding vector
- Store - Save person info and embedding to database
Identification Flow
- Stream - Frontend sends video frames via WebSocket
- Process - Backend detects face and generates embedding
- Match - Compare against all stored embeddings
- Return - Send match result back to frontend
Performance Characteristics
Latency Breakdown
| Stage | Time (CPU) | Time (GPU) |
|---|---|---|
| Face Detection | 20-50ms | 10-20ms |
| Embedding Generation | 100-200ms | 20-40ms |
| Similarity Search | 1-5ms | 1-5ms |
| Total | 120-255ms | 31-65ms |
First inference is slower due to model loading. Subsequent inferences are much faster.
Scalability
- Memory Bank Size: Tested up to 1,000 persons
- Embeddings per Person: Recommended 1-5
- Concurrent Users: Single-user system (webcam limitation)
- Frame Rate: 2-5 FPS identification rate
Accuracy Considerations
Factors Affecting Accuracy
Good Lighting ☀️
Even, front-facing illumination improves accuracy significantly
Face Angle 📐
Works best with frontal faces (±30° rotation)
Image Quality 🖼️
Higher resolution and focus improve embedding quality
Multiple Samples 📚
Registering 3-5 photos per person reduces false negatives
Threshold Tuning
The match threshold balances false positives vs false negatives:
- Lower threshold (0.4-0.5): More matches, higher false positive rate
- Medium threshold (0.6-0.7): Balanced (default)
- Higher threshold (0.8+): Fewer matches, higher false negative rate
Use the calibration script to find optimal threshold for your use case:
python scripts/calibrate_threshold.py
Privacy & Security
All data is stored locally. No cloud uploads. No external API calls.
- Data Location: SQLite database in
data/visage.db - Encryption: Not encrypted by default (consider encrypting database at rest)
- Access Control: No built-in authentication (runs on localhost)
- Data Retention: Manual deletion required
See Ethics Overview for more details.
Next Steps
- Face Detection - Deep dive into detection algorithms
- Embeddings - Learn how embeddings work