Project Updates
-
2025-11-19New New version, sample release for review.
-
2025-08-05Initial sample release for internal testing.
Key Features
Cross-Modal Diffusion Fusion
Novel cross-guided denoising mechanism where RGB and thermal features provide mutual guidance during the diffusion process, effectively harmonizing multi-modal inputs.
Diffusion Refiner
A plug-and-play module designed to enhance and refine unified feature representations through iterative denoising steps, boosting feature distinctiveness.
Hierarchical Tracker
Adaptively handles confidence estimation across multiple levels for improved tracking robustness in occluded or cluttered scenes.
End-to-End & Real-time
Unifies object detection, state estimation, and data association without complex post-processing, enabling online tracking with high temporal coherence.
Architecture Overview
The proposed DM3T framework. It consists of Cross-Modal Diffusion Fusion for harmonizing modalities and a Hierarchical Tracker for robust association.
Benchmark Results (VTMOT)
Evaluation metrics on VTMOT Test split. HOTA and IDF1 are the primary metrics for multi-object tracking performance.
Sample C-MDF Preview
Getting Started
Environment Setup
# Create conda environment
conda create -n dm3t python=3.8
conda activate dm3t
# Install Dependencies
pip install -r requirements.txt
Dataset Preparation
Organize the VTMOT dataset structure as follows:
Then convert annotations:
Training
python main.py tracking \
--exp_id exp_v1 \
--dataset vtmot \
--arch dla_34
Evaluation
python eval.py \
--BENCHMARK VTMOT \
--SPLIT_TO_EVAL test