Skip to main content

System Overview

Visage is a real-time face recognition system built on three core pillars: detection, embedding, and matching.

The Recognition Pipeline

Core Components

1. Face Detection

The first step in the pipeline is detecting faces in the video frame.

Detection Models

  • OpenCV Haar Cascades (default)
  • SSD (Single Shot Detector)
  • MTCNN (Multi-task CNN)
  • RetinaFace

Performance

  • 30-60 FPS on modern CPUs
  • Handles multiple faces per frame
  • Automatic quality filtering

What it does: Locates faces in the image and extracts bounding box coordinates.

Output: Face region (x, y, width, height)

2. Face Embeddings

Once a face is detected, we generate a mathematical representation called an embedding.

What is an embedding?

A fixed-length vector (typically 128 or 512 dimensions) that captures the unique features of a face. Similar faces have embeddings that are close together in vector space.

Models Available:

  • Facenet (128-dim) - Fast, accurate, default choice
  • VGG-Face (2622-dim) - Very accurate, slower
  • ArcFace (512-dim) - State-of-the-art accuracy
  • OpenFace (128-dim) - Lightweight alternative

Embedding Properties:

  • L2 normalized (unit length)
  • Invariant to lighting, pose (within limits)
  • Consistent across different images of same person

3. Similarity Matching

Embeddings are compared using cosine similarity to find matches in the memory bank.

Cosine Similarity Formula:

similarity(A, B) = (A · B) / (||A|| × ||B||)

Since embeddings are L2-normalized, this simplifies to:

similarity(A, B) = A · B  (dot product)

Similarity Range: 0.0 (completely different) to 1.0 (identical)

Default Threshold: 0.6

4. Memory Bank

A local SQLite database stores person metadata and embeddings.

Database Schema:

CREATE TABLE persons (
person_id TEXT PRIMARY KEY,
name TEXT NOT NULL,
relationship TEXT,
metadata TEXT,
created_at TIMESTAMP
);

CREATE TABLE embeddings (
embedding_id TEXT PRIMARY KEY,
person_id TEXT NOT NULL,
embedding BLOB NOT NULL,
embedding_dim INTEGER,
created_at TIMESTAMP,
FOREIGN KEY (person_id) REFERENCES persons(person_id)
);

Data Flow

Registration Flow

  1. Capture - User captures a photo via webcam
  2. Detect - System detects face in image
  3. Embed - Generate embedding vector
  4. Store - Save person info and embedding to database

Identification Flow

  1. Stream - Frontend sends video frames via WebSocket
  2. Process - Backend detects face and generates embedding
  3. Match - Compare against all stored embeddings
  4. Return - Send match result back to frontend

Performance Characteristics

Latency Breakdown

StageTime (CPU)Time (GPU)
Face Detection20-50ms10-20ms
Embedding Generation100-200ms20-40ms
Similarity Search1-5ms1-5ms
Total120-255ms31-65ms
tip

First inference is slower due to model loading. Subsequent inferences are much faster.

Scalability

  • Memory Bank Size: Tested up to 1,000 persons
  • Embeddings per Person: Recommended 1-5
  • Concurrent Users: Single-user system (webcam limitation)
  • Frame Rate: 2-5 FPS identification rate

Accuracy Considerations

Factors Affecting Accuracy

Good Lighting ☀️

Even, front-facing illumination improves accuracy significantly

Face Angle 📐

Works best with frontal faces (±30° rotation)

Image Quality 🖼️

Higher resolution and focus improve embedding quality

Multiple Samples 📚

Registering 3-5 photos per person reduces false negatives

Threshold Tuning

The match threshold balances false positives vs false negatives:

  • Lower threshold (0.4-0.5): More matches, higher false positive rate
  • Medium threshold (0.6-0.7): Balanced (default)
  • Higher threshold (0.8+): Fewer matches, higher false negative rate

Use the calibration script to find optimal threshold for your use case:

python scripts/calibrate_threshold.py

Privacy & Security

Privacy First

All data is stored locally. No cloud uploads. No external API calls.

  • Data Location: SQLite database in data/visage.db
  • Encryption: Not encrypted by default (consider encrypting database at rest)
  • Access Control: No built-in authentication (runs on localhost)
  • Data Retention: Manual deletion required

See Ethics Overview for more details.

Next Steps