SecureGate Docs

InsightFace antelopev2 SCRFD detector — real-time face detection, quality filtering, and alignment pipeline

Detection Model

SecureGate uses InsightFace antelopev2 with the scrfd_10g_bnkps detector for face detection. This model is part of the antelopev2 model pack from the InsightFace project (MIT license).

Property	Value
Model	scrfd_10g_bnkps
Pack	antelopev2
Format	ONNX
Input	Any resolution RGB image
Output	Bounding boxes, confidence scores, 5-point landmarks
License	MIT (InsightFace)
Inference	ONNX Runtime with CUDA EP

SCRFD (Sample and Computation Redistribution for Efficient Face Detection) is a high-performance anchor-free face detector. The 10g_bnkps variant provides 10 GFLOPs of compute with bounding box, keypoint, and score predictions.

Detection Pipeline

Input Image
    |
    v
SCRFD Detector (scrfd_10g_bnkps)
    |
    +-- Bounding boxes (x1, y1, x2, y2)
    +-- Confidence scores
    +-- 5-point landmarks (left eye, right eye, nose, left mouth, right mouth)
    |
    v
Quality Filter
    |
    +-- Blur score (Laplacian variance, reject > 0.5)
    +-- Yaw angle (reject > 30 degrees)
    +-- Pitch angle (reject > 30 degrees)
    +-- Face size (reject < 80px)
    |
    v
Alignment (ArcFace standard)
    |
    +-- 2D similarity transform using 5 landmarks
    +-- Warp to 112x112 canonical face
    |
    v
Face Crop -> Embed Service or Storage

Quality Filtering

Not every detected face is usable for recognition. The quality filter ensures only high-quality faces enter the embedding pipeline.

Filter	Metric	Threshold	Reason
Blur	Laplacian variance	< 0.5 (pass)	Blurry faces produce unreliable embeddings
Yaw	3D head pose	< 30 deg	Extreme left/right turn distorts features
Pitch	3D head pose	< 30 deg	Extreme up/down tilt distorts features
Face Size	Bounding box width	>= 80px	Small faces lack detail for recognition

Faces that fail quality filtering are returned in the API response with embedding_stored: false and a rejection_reason field, but are not passed to the embedding service.

Alignment

After quality filtering, accepted faces are aligned to a canonical 112x112 coordinate frame using the standard ArcFace alignment procedure:

Compute a 2D similarity transformation (rotation, scale, translation) from the 5 detected landmarks to the ArcFace reference landmarks.
Apply the affine warp to produce a 112x112 RGB image.
The aligned crop is the input to the ArcFace recognition model (glintr100).

This alignment step is critical — it normalizes head pose and scale so that the same person produces similar embeddings regardless of camera angle or distance.

Fallback Chain

If the antelopev2 model pack is unavailable at startup, the ingest service falls back to buffalo_l. This is a larger InsightFace model pack with comparable detection accuracy but higher latency. The fallback is automatic and logged.

Performance

On an NVIDIA GH200 GPU:

Metric	Value
Detection latency (single face)	~3ms
Detection latency (10 faces)	~8ms
Max throughput	~300 frames/sec at 720p
Concurrent streams	~150 at 5 FPS

Configuration

Detection parameters are set via environment variables on the ingest service:

Variable	Default	Description
`DET_THRESH`	`0.5`	Minimum detection confidence
`QUALITY_BLUR_THRESH`	`0.5`	Maximum acceptable blur score
`QUALITY_ANGLE_THRESH`	`30`	Maximum yaw/pitch in degrees
`QUALITY_MIN_SIZE`	`80`	Minimum face bounding box width in pixels
`MODEL_PACK`	`antelopev2`	InsightFace model pack name

Face Detection