Research
A modular research framework for multi-model soccer video analysis. Combines RF-DETR detection, SAM2 segmentation, and SigLIP zero-shot identification into a configurable pipeline.
Fig. 1 — Drag to compare raw video input (left) with pipeline output showing detection, segmentation, and identification (right).
Architecture
Object Detection
ResNet50 backbone + Transformer decoder for real-time player, ball, and referee detection. 577 lines of PyTorch implementation with configurable confidence thresholds.
Video Segmentation
Frame-by-frame video segmentation with temporal consistency and occlusion handling. Custom SAM2Tracker for persistent identity tracking across frames.
Zero-Shot Identification
Vision-language model for player identification without pre-training on team rosters. VisionTransformer + TextTransformer with semantic matching capabilities.
Framework