SecureGate Docs

Face Recognition

ArcFace glintr100 — 512-dimensional face embeddings, cosine similarity matching, per-tenant vector index

Recognition Model

SecureGate uses ArcFace glintr100 from the InsightFace antelopev2 model pack for face recognition. This model produces 512-dimensional feature vectors (embeddings) from aligned 112x112 face crops.

PropertyValue
Modelglintr100
Packantelopev2
ArchitectureResNet-100 with ArcFace loss
FormatONNX
Input112x112 RGB aligned face crop
Output512-d float32 embedding (L2-normalized)
LicenseMIT (InsightFace)
InferenceONNX Runtime with CUDA EP

How It Works

Embedding Extraction

  1. The ingest service detects a face and aligns it to 112x112.
  2. The aligned crop is sent to the embed service.
  3. The embed service runs ArcFace glintr100 to produce a 512-dimensional embedding.
  4. The embedding is L2-normalized (unit length on the 512-d hypersphere).
  5. The normalized embedding is encrypted with the tenant's CEK and stored in the tenant's sqlite-vec database.

Similarity Matching

Two face embeddings are compared using cosine similarity:

similarity = dot(embedding_a, embedding_b)

Because embeddings are L2-normalized, the dot product equals the cosine similarity. Values range from -1 (opposite) to 1 (identical).

SimilarityInterpretation
> 0.7Very likely same person
0.5 - 0.7Possible match (review recommended)
< 0.5Different people

The default match threshold is 0.6 and can be adjusted per-request via the threshold parameter on /v1/search.

MicroBatcher

The embed service uses a MicroBatcher to maximize GPU throughput. Instead of processing one face at a time, incoming embedding requests are accumulated into a batch and processed together.

Request 1 (face crop) --+
Request 2 (face crop) --+--> MicroBatcher --> GPU batch inference --> results
Request 3 (face crop) --+
ParameterDefaultDescription
BATCH_SIZE32Maximum faces per GPU batch
BATCH_TIMEOUT_MS50Maximum wait time before flushing partial batch

This amortizes GPU kernel launch overhead and achieves near-linear throughput scaling with batch size.

Per-Tenant Vector Index

sqlite-vec (Primary)

Each tenant has its own SQLite database with the sqlite-vec extension for vector similarity search:

/data/tenants/\{tenant_id\}/embeddings.db

sqlite-vec provides exact nearest-neighbor and approximate nearest-neighbor (ANN) search on 512-d float32 vectors. For tenants with fewer than 1 million faces, exact search runs in under 1ms.

FAISS (Hot Cache)

For high-throughput matching (e.g., live camera streams with many concurrent detections), a FAISS index is loaded into GPU memory from the tenant's sqlite-vec database. The FAISS index is:

  • Loaded when the tenant becomes active (first request after inactivity).
  • Refreshed when new embeddings are added.
  • Evicted after a configurable inactivity timeout.

This two-tier approach keeps the persistent store simple (one SQLite file per tenant) while providing sub-millisecond search for active tenants.

Embedding Encryption

All face embeddings are encrypted at rest and in transit:

  1. At rest: Each embedding is encrypted with the tenant's Customer Encryption Key (CEK) before writing to sqlite-vec.
  2. On read: KMS derives a session DEK from the CEK, decrypts the embedding in memory, performs the search, and discards the DEK.
  3. In transit: All inter-service communication uses mTLS.

This ensures that even if a tenant's database file is exfiltrated, the embeddings are unreadable without the tenant's CEK (which is held in KMS, wrapped by the tenant's TEK, which is wrapped by the org's OEK in HSM).

Performance

On an NVIDIA GH200 GPU with MicroBatcher (batch size 32):

MetricValue
Embedding extraction (single)~5ms
Embedding extraction (batch of 32)~15ms
Search latency (sqlite-vec, 100K faces)~0.8ms
Search latency (FAISS GPU, 1M faces)~0.3ms

On this page