Face Recognition
ArcFace glintr100 — 512-dimensional face embeddings, cosine similarity matching, per-tenant vector index
Recognition Model
SecureGate uses ArcFace glintr100 from the InsightFace antelopev2 model pack for face recognition. This model produces 512-dimensional feature vectors (embeddings) from aligned 112x112 face crops.
| Property | Value |
|---|---|
| Model | glintr100 |
| Pack | antelopev2 |
| Architecture | ResNet-100 with ArcFace loss |
| Format | ONNX |
| Input | 112x112 RGB aligned face crop |
| Output | 512-d float32 embedding (L2-normalized) |
| License | MIT (InsightFace) |
| Inference | ONNX Runtime with CUDA EP |
How It Works
Embedding Extraction
- The ingest service detects a face and aligns it to 112x112.
- The aligned crop is sent to the embed service.
- The embed service runs ArcFace glintr100 to produce a 512-dimensional embedding.
- The embedding is L2-normalized (unit length on the 512-d hypersphere).
- The normalized embedding is encrypted with the tenant's CEK and stored in the tenant's sqlite-vec database.
Similarity Matching
Two face embeddings are compared using cosine similarity:
similarity = dot(embedding_a, embedding_b)Because embeddings are L2-normalized, the dot product equals the cosine similarity. Values range from -1 (opposite) to 1 (identical).
| Similarity | Interpretation |
|---|---|
| > 0.7 | Very likely same person |
| 0.5 - 0.7 | Possible match (review recommended) |
| < 0.5 | Different people |
The default match threshold is 0.6 and can be adjusted per-request via the threshold parameter on /v1/search.
MicroBatcher
The embed service uses a MicroBatcher to maximize GPU throughput. Instead of processing one face at a time, incoming embedding requests are accumulated into a batch and processed together.
Request 1 (face crop) --+
Request 2 (face crop) --+--> MicroBatcher --> GPU batch inference --> results
Request 3 (face crop) --+| Parameter | Default | Description |
|---|---|---|
BATCH_SIZE | 32 | Maximum faces per GPU batch |
BATCH_TIMEOUT_MS | 50 | Maximum wait time before flushing partial batch |
This amortizes GPU kernel launch overhead and achieves near-linear throughput scaling with batch size.
Per-Tenant Vector Index
sqlite-vec (Primary)
Each tenant has its own SQLite database with the sqlite-vec extension for vector similarity search:
/data/tenants/\{tenant_id\}/embeddings.dbsqlite-vec provides exact nearest-neighbor and approximate nearest-neighbor (ANN) search on 512-d float32 vectors. For tenants with fewer than 1 million faces, exact search runs in under 1ms.
FAISS (Hot Cache)
For high-throughput matching (e.g., live camera streams with many concurrent detections), a FAISS index is loaded into GPU memory from the tenant's sqlite-vec database. The FAISS index is:
- Loaded when the tenant becomes active (first request after inactivity).
- Refreshed when new embeddings are added.
- Evicted after a configurable inactivity timeout.
This two-tier approach keeps the persistent store simple (one SQLite file per tenant) while providing sub-millisecond search for active tenants.
Embedding Encryption
All face embeddings are encrypted at rest and in transit:
- At rest: Each embedding is encrypted with the tenant's Customer Encryption Key (CEK) before writing to sqlite-vec.
- On read: KMS derives a session DEK from the CEK, decrypts the embedding in memory, performs the search, and discards the DEK.
- In transit: All inter-service communication uses mTLS.
This ensures that even if a tenant's database file is exfiltrated, the embeddings are unreadable without the tenant's CEK (which is held in KMS, wrapped by the tenant's TEK, which is wrapped by the org's OEK in HSM).
Performance
On an NVIDIA GH200 GPU with MicroBatcher (batch size 32):
| Metric | Value |
|---|---|
| Embedding extraction (single) | ~5ms |
| Embedding extraction (batch of 32) | ~15ms |
| Search latency (sqlite-vec, 100K faces) | ~0.8ms |
| Search latency (FAISS GPU, 1M faces) | ~0.3ms |