SecureGate Docs

Face Detection

InsightFace antelopev2 SCRFD detector — real-time face detection, quality filtering, and alignment pipeline

Detection Model

SecureGate uses InsightFace antelopev2 with the scrfd_10g_bnkps detector for face detection. This model is part of the antelopev2 model pack from the InsightFace project (MIT license).

PropertyValue
Modelscrfd_10g_bnkps
Packantelopev2
FormatONNX
InputAny resolution RGB image
OutputBounding boxes, confidence scores, 5-point landmarks
LicenseMIT (InsightFace)
InferenceONNX Runtime with CUDA EP

SCRFD (Sample and Computation Redistribution for Efficient Face Detection) is a high-performance anchor-free face detector. The 10g_bnkps variant provides 10 GFLOPs of compute with bounding box, keypoint, and score predictions.

Detection Pipeline

Input Image
    |
    v
SCRFD Detector (scrfd_10g_bnkps)
    |
    +-- Bounding boxes (x1, y1, x2, y2)
    +-- Confidence scores
    +-- 5-point landmarks (left eye, right eye, nose, left mouth, right mouth)
    |
    v
Quality Filter
    |
    +-- Blur score (Laplacian variance, reject > 0.5)
    +-- Yaw angle (reject > 30 degrees)
    +-- Pitch angle (reject > 30 degrees)
    +-- Face size (reject < 80px)
    |
    v
Alignment (ArcFace standard)
    |
    +-- 2D similarity transform using 5 landmarks
    +-- Warp to 112x112 canonical face
    |
    v
Face Crop -> Embed Service or Storage

Quality Filtering

Not every detected face is usable for recognition. The quality filter ensures only high-quality faces enter the embedding pipeline.

FilterMetricThresholdReason
BlurLaplacian variance< 0.5 (pass)Blurry faces produce unreliable embeddings
Yaw3D head pose< 30 degExtreme left/right turn distorts features
Pitch3D head pose< 30 degExtreme up/down tilt distorts features
Face SizeBounding box width>= 80pxSmall faces lack detail for recognition

Faces that fail quality filtering are returned in the API response with embedding_stored: false and a rejection_reason field, but are not passed to the embedding service.

Alignment

After quality filtering, accepted faces are aligned to a canonical 112x112 coordinate frame using the standard ArcFace alignment procedure:

  1. Compute a 2D similarity transformation (rotation, scale, translation) from the 5 detected landmarks to the ArcFace reference landmarks.
  2. Apply the affine warp to produce a 112x112 RGB image.
  3. The aligned crop is the input to the ArcFace recognition model (glintr100).

This alignment step is critical — it normalizes head pose and scale so that the same person produces similar embeddings regardless of camera angle or distance.

Fallback Chain

If the antelopev2 model pack is unavailable at startup, the ingest service falls back to buffalo_l. This is a larger InsightFace model pack with comparable detection accuracy but higher latency. The fallback is automatic and logged.

Performance

On an NVIDIA GH200 GPU:

MetricValue
Detection latency (single face)~3ms
Detection latency (10 faces)~8ms
Max throughput~300 frames/sec at 720p
Concurrent streams~150 at 5 FPS

Configuration

Detection parameters are set via environment variables on the ingest service:

VariableDefaultDescription
DET_THRESH0.5Minimum detection confidence
QUALITY_BLUR_THRESH0.5Maximum acceptable blur score
QUALITY_ANGLE_THRESH30Maximum yaw/pitch in degrees
QUALITY_MIN_SIZE80Minimum face bounding box width in pixels
MODEL_PACKantelopev2InsightFace model pack name

On this page