Computer Vision · Researcher

Soccer Vision Research.

Modular research framework that orchestrates four pretrained vision models (RF-DETR, SAM2, SigLIP, ResNet) into one configurable soccer-analysis pipeline with selectable fusion strategies.

Role: Researcher
When: 2024
Stack: Python, PyTorch, RF-DETR, SAM2
Scale: 4 models orchestrated

View live demo ↗GitHub ↗

RF-DETR · SAM2 · SigLIP

4 modelsorchestrated

7fusion strategies

3execution modes

YAMLconfig-driven

The problem

Combining several vision models for soccer analysis usually means hard-wiring one fixed pipeline. The goal here was a modular framework where detection, segmentation, identification, and classification models are swappable and configurable, so model combinations can be A/B tested without rewriting code.

What it does

An orchestration layer (ModelPipeline) that runs RF-DETR detection, SAM2 segmentation and tracking, SigLIP zero-shot identification, and ResNet jersey classification.
Three execution modes (sequential, parallel, adaptive) plus a result-fusion layer with seven selectable strategies.
YAML-driven, schema-validated config per model, with a model registry and manager so presets are configuration, not code.

Impact

Swap models or presets through config alone, enabling rapid A/B testing of model combinations.
SigLIP text-image matching identifies players and teams zero-shot, without training on a specific roster.
A documented framework with per-model demos and a config system built for research reproducibility.