System Overview

Visage is a real-time face recognition system built on three core pillars: detection, embedding, and matching.

The Recognition Pipeline

Core Components

1. Face Detection

The first step in the pipeline is detecting faces in the video frame.

Detection Models

OpenCV Haar Cascades (default)
SSD (Single Shot Detector)
MTCNN (Multi-task CNN)
RetinaFace

Performance

30-60 FPS on modern CPUs
Handles multiple faces per frame
Automatic quality filtering

What it does: Locates faces in the image and extracts bounding box coordinates.

Output: Face region (x, y, width, height)

2. Face Embeddings

Once a face is detected, we generate a mathematical representation called an embedding.

What is an embedding?

A fixed-length vector (typically 128 or 512 dimensions) that captures the unique features of a face. Similar faces have embeddings that are close together in vector space.

Models Available:

Facenet (128-dim) - Fast, accurate, default choice
VGG-Face (2622-dim) - Very accurate, slower
ArcFace (512-dim) - State-of-the-art accuracy
OpenFace (128-dim) - Lightweight alternative

Embedding Properties:

L2 normalized (unit length)
Invariant to lighting, pose (within limits)
Consistent across different images of same person

3. Similarity Matching

Embeddings are compared using cosine similarity to find matches in the memory bank.

Cosine Similarity Formula:

similarity(A, B) = (A · B) / (||A|| × ||B||)

Since embeddings are L2-normalized, this simplifies to:

similarity(A, B) = A · B  (dot product)

Similarity Range: 0.0 (completely different) to 1.0 (identical)

Default Threshold: 0.6

4. Memory Bank

A local SQLite database stores person metadata and embeddings.

Database Schema:

CREATE TABLE persons (
    person_id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    relationship TEXT,
    metadata TEXT,
    created_at TIMESTAMP
);

CREATE TABLE embeddings (
    embedding_id TEXT PRIMARY KEY,
    person_id TEXT NOT NULL,
    embedding BLOB NOT NULL,
    embedding_dim INTEGER,
    created_at TIMESTAMP,
    FOREIGN KEY (person_id) REFERENCES persons(person_id)
);

Data Flow

Registration Flow

Capture - User captures a photo via webcam
Detect - System detects face in image
Embed - Generate embedding vector
Store - Save person info and embedding to database

Identification Flow

Stream - Frontend sends video frames via WebSocket
Process - Backend detects face and generates embedding
Match - Compare against all stored embeddings
Return - Send match result back to frontend

Performance Characteristics

Latency Breakdown

Stage	Time (CPU)	Time (GPU)
Face Detection	20-50ms	10-20ms
Embedding Generation	100-200ms	20-40ms
Similarity Search	1-5ms	1-5ms
Total	120-255ms	31-65ms

tip

First inference is slower due to model loading. Subsequent inferences are much faster.

Scalability

Memory Bank Size: Tested up to 1,000 persons
Embeddings per Person: Recommended 1-5
Concurrent Users: Single-user system (webcam limitation)
Frame Rate: 2-5 FPS identification rate

Accuracy Considerations

Factors Affecting Accuracy

Good Lighting ☀️

Even, front-facing illumination improves accuracy significantly

Face Angle 📐

Works best with frontal faces (±30° rotation)

Image Quality 🖼️

Higher resolution and focus improve embedding quality

Multiple Samples 📚

Registering 3-5 photos per person reduces false negatives

Threshold Tuning

The match threshold balances false positives vs false negatives:

Lower threshold (0.4-0.5): More matches, higher false positive rate
Medium threshold (0.6-0.7): Balanced (default)
Higher threshold (0.8+): Fewer matches, higher false negative rate

Use the calibration script to find optimal threshold for your use case:

python scripts/calibrate_threshold.py

Privacy & Security

Privacy First

All data is stored locally. No cloud uploads. No external API calls.

Data Location: SQLite database in data/visage.db
Encryption: Not encrypted by default (consider encrypting database at rest)
Access Control: No built-in authentication (runs on localhost)
Data Retention: Manual deletion required

See Ethics Overview for more details.

Next Steps

Face Detection - Deep dive into detection algorithms
Embeddings - Learn how embeddings work

The Recognition Pipeline​

Core Components​

1. Face Detection​

Detection Models​

Performance​

2. Face Embeddings​

3. Similarity Matching​

4. Memory Bank​

Data Flow​

Registration Flow​

Identification Flow​

Performance Characteristics​

Latency Breakdown​

Scalability​

Accuracy Considerations​

Factors Affecting Accuracy​

Good Lighting ☀️​

Face Angle 📐​

Image Quality 🖼️​

Multiple Samples 📚​

Threshold Tuning​

Privacy & Security​

Next Steps​