Project Updates
-
2026-06-11Release Repository reorganized and the full version released.
-
2026-03-23The next update is planned after the CVPR 2026 conference.
-
2025-11-19Updated sample release for review.
Key Features
Cross-Modal Diffusion Fusion
Cross-guided denoising mechanism where RGB and thermal features provide mutual guidance during the diffusion process, effectively harmonizing multi-modal inputs.
Diffusion Refiner
A plug-and-play module designed to enhance and refine unified feature representations through iterative denoising steps, boosting feature distinctiveness.
Hierarchical Tracker
Adaptively handles confidence estimation across multiple levels for improved tracking robustness in occluded or challenging scenes.
End-to-End & Real-time
Unifies object detection, state estimation, and data association without complex post-processing, enabling online tracking with high temporal coherence.
VTMOT Benchmark Results
Model Zoo
Pretrained Weights
Download the pretrained models required for reproducing our VTMOT baseline and testing tracking.
BaiduYun (Pwd: q8i4)Citation
@InProceedings{Li_2026_CVPR,
author = {Li, Weiran and Liu, Yeqiang and Wei, Yijie and Han, Mina and Guo, Qiannan and Li, Zhenbo},
title = {DM{\textasciicircum}3T: Harmonizing Modalities via Diffusion for Multi-Object Tracking},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
month = {June},
year = {2026},
pages = {8398-8407}
}