Research

Soccer Vision Research

A modular research framework for multi-model soccer video analysis. Combines RF-DETR detection, SAM2 segmentation, and SigLIP zero-shot identification into a configurable pipeline.

View on GitHub
RF-DETRSAM2SigLIPPyTorch
Raw Input
#1
#7
#3
#4
#8
#6
#2
#10
#9
#11
#1
Pipeline Output

Fig. 1 — Drag to compare raw video input (left) with pipeline output showing detection, segmentation, and identification (right).

Architecture

Three-Model Pipeline

RF-DETR

Object Detection

ResNet50 backbone + Transformer decoder for real-time player, ball, and referee detection. 577 lines of PyTorch implementation with configurable confidence thresholds.

SAM2

Video Segmentation

Frame-by-frame video segmentation with temporal consistency and occlusion handling. Custom SAM2Tracker for persistent identity tracking across frames.

SigLIP

Zero-Shot Identification

Vision-language model for player identification without pre-training on team rosters. VisionTransformer + TextTransformer with semantic matching capabilities.

Framework

What Makes It Unique

Modular pipeline — swap models via YAML config presets (balanced, real-time, high-accuracy)
ResultFuser — intelligent multi-model output combination with adaptive fusion strategies
5 demo applications — complete system, single model, real-time, benchmark, and GUI
19 Python modules with comprehensive documentation and test suite