SecureGate Docs

ArcFace glintr100 — 512-dimensional face embeddings, cosine similarity matching, per-tenant vector index

Recognition Model

SecureGate uses ArcFace glintr100 from the InsightFace antelopev2 model pack for face recognition. This model produces 512-dimensional feature vectors (embeddings) from aligned 112x112 face crops.

Property	Value
Model	glintr100
Pack	antelopev2
Architecture	ResNet-100 with ArcFace loss
Format	ONNX
Input	112x112 RGB aligned face crop
Output	512-d float32 embedding (L2-normalized)
License	MIT (InsightFace)
Inference	ONNX Runtime with CUDA EP

How It Works

Embedding Extraction

The ingest service detects a face and aligns it to 112x112.
The aligned crop is sent to the embed service.
The embed service runs ArcFace glintr100 to produce a 512-dimensional embedding.
The embedding is L2-normalized (unit length on the 512-d hypersphere).
The normalized embedding is encrypted with the tenant's CEK and stored in the tenant's sqlite-vec database.

Similarity Matching

Two face embeddings are compared using cosine similarity:

similarity = dot(embedding_a, embedding_b)

Because embeddings are L2-normalized, the dot product equals the cosine similarity. Values range from -1 (opposite) to 1 (identical).

Similarity	Interpretation
> 0.7	Very likely same person
0.5 - 0.7	Possible match (review recommended)
< 0.5	Different people

The default match threshold is 0.6 and can be adjusted per-request via the threshold parameter on /v1/search.

MicroBatcher

The embed service uses a MicroBatcher to maximize GPU throughput. Instead of processing one face at a time, incoming embedding requests are accumulated into a batch and processed together.

Request 1 (face crop) --+
Request 2 (face crop) --+--> MicroBatcher --> GPU batch inference --> results
Request 3 (face crop) --+

Parameter	Default	Description
`BATCH_SIZE`	`32`	Maximum faces per GPU batch
`BATCH_TIMEOUT_MS`	`50`	Maximum wait time before flushing partial batch

This amortizes GPU kernel launch overhead and achieves near-linear throughput scaling with batch size.

Per-Tenant Vector Index

sqlite-vec (Primary)

Each tenant has its own SQLite database with the sqlite-vec extension for vector similarity search:

/data/tenants/\{tenant_id\}/embeddings.db

sqlite-vec provides exact nearest-neighbor and approximate nearest-neighbor (ANN) search on 512-d float32 vectors. For tenants with fewer than 1 million faces, exact search runs in under 1ms.

FAISS (Hot Cache)

For high-throughput matching (e.g., live camera streams with many concurrent detections), a FAISS index is loaded into GPU memory from the tenant's sqlite-vec database. The FAISS index is:

Loaded when the tenant becomes active (first request after inactivity).
Refreshed when new embeddings are added.
Evicted after a configurable inactivity timeout.

This two-tier approach keeps the persistent store simple (one SQLite file per tenant) while providing sub-millisecond search for active tenants.

Embedding Encryption

All face embeddings are encrypted at rest and in transit:

At rest: Each embedding is encrypted with the tenant's Customer Encryption Key (CEK) before writing to sqlite-vec.
On read: KMS derives a session DEK from the CEK, decrypts the embedding in memory, performs the search, and discards the DEK.
In transit: All inter-service communication uses mTLS.

This ensures that even if a tenant's database file is exfiltrated, the embeddings are unreadable without the tenant's CEK (which is held in KMS, wrapped by the tenant's TEK, which is wrapped by the org's OEK in HSM).

Performance

On an NVIDIA GH200 GPU with MicroBatcher (batch size 32):

Metric	Value
Embedding extraction (single)	~5ms
Embedding extraction (batch of 32)	~15ms
Search latency (sqlite-vec, 100K faces)	~0.8ms
Search latency (FAISS GPU, 1M faces)	~0.3ms

Face Recognition