Face Detection
InsightFace antelopev2 SCRFD detector — real-time face detection, quality filtering, and alignment pipeline
Detection Model
SecureGate uses InsightFace antelopev2 with the scrfd_10g_bnkps detector for face detection. This model is part of the antelopev2 model pack from the InsightFace project (MIT license).
| Property | Value |
|---|---|
| Model | scrfd_10g_bnkps |
| Pack | antelopev2 |
| Format | ONNX |
| Input | Any resolution RGB image |
| Output | Bounding boxes, confidence scores, 5-point landmarks |
| License | MIT (InsightFace) |
| Inference | ONNX Runtime with CUDA EP |
SCRFD (Sample and Computation Redistribution for Efficient Face Detection) is a high-performance anchor-free face detector. The 10g_bnkps variant provides 10 GFLOPs of compute with bounding box, keypoint, and score predictions.
Detection Pipeline
Input Image
|
v
SCRFD Detector (scrfd_10g_bnkps)
|
+-- Bounding boxes (x1, y1, x2, y2)
+-- Confidence scores
+-- 5-point landmarks (left eye, right eye, nose, left mouth, right mouth)
|
v
Quality Filter
|
+-- Blur score (Laplacian variance, reject > 0.5)
+-- Yaw angle (reject > 30 degrees)
+-- Pitch angle (reject > 30 degrees)
+-- Face size (reject < 80px)
|
v
Alignment (ArcFace standard)
|
+-- 2D similarity transform using 5 landmarks
+-- Warp to 112x112 canonical face
|
v
Face Crop -> Embed Service or StorageQuality Filtering
Not every detected face is usable for recognition. The quality filter ensures only high-quality faces enter the embedding pipeline.
| Filter | Metric | Threshold | Reason |
|---|---|---|---|
| Blur | Laplacian variance | < 0.5 (pass) | Blurry faces produce unreliable embeddings |
| Yaw | 3D head pose | < 30 deg | Extreme left/right turn distorts features |
| Pitch | 3D head pose | < 30 deg | Extreme up/down tilt distorts features |
| Face Size | Bounding box width | >= 80px | Small faces lack detail for recognition |
Faces that fail quality filtering are returned in the API response with embedding_stored: false and a rejection_reason field, but are not passed to the embedding service.
Alignment
After quality filtering, accepted faces are aligned to a canonical 112x112 coordinate frame using the standard ArcFace alignment procedure:
- Compute a 2D similarity transformation (rotation, scale, translation) from the 5 detected landmarks to the ArcFace reference landmarks.
- Apply the affine warp to produce a 112x112 RGB image.
- The aligned crop is the input to the ArcFace recognition model (glintr100).
This alignment step is critical — it normalizes head pose and scale so that the same person produces similar embeddings regardless of camera angle or distance.
Fallback Chain
If the antelopev2 model pack is unavailable at startup, the ingest service falls back to buffalo_l. This is a larger InsightFace model pack with comparable detection accuracy but higher latency. The fallback is automatic and logged.
Performance
On an NVIDIA GH200 GPU:
| Metric | Value |
|---|---|
| Detection latency (single face) | ~3ms |
| Detection latency (10 faces) | ~8ms |
| Max throughput | ~300 frames/sec at 720p |
| Concurrent streams | ~150 at 5 FPS |
Configuration
Detection parameters are set via environment variables on the ingest service:
| Variable | Default | Description |
|---|---|---|
DET_THRESH | 0.5 | Minimum detection confidence |
QUALITY_BLUR_THRESH | 0.5 | Maximum acceptable blur score |
QUALITY_ANGLE_THRESH | 30 | Maximum yaw/pitch in degrees |
QUALITY_MIN_SIZE | 80 | Minimum face bounding box width in pixels |
MODEL_PACK | antelopev2 | InsightFace model pack name |