M4FT Logo

Mamba, Migratory, Mobile
and Multiple Fish Tracking

Weiran Li, Yeqiang Liu, Wenxu Wang, Zhenbo Li*
China Agricultural University

Abstract

Tracking is a core technique for analyzing complex fish behaviors, such as schooling and predator avoidance. However, this task presents unique and severe challenges compared to generic object tracking of rigid targets like pedestrians or vehicles. Fish exhibit extreme non-rigid deformation and erratic motion, while underwater environments are characterized by poor illumination and low visibility. These issues, compounded by the need for lightweight, real-time deployment in high-density scenarios, often lead to catastrophic target loss and identity switching in conventional trackers. To tackle these specific challenges, we propose M4FT, a lightweight and robust online multiple fish tracking framework. To overcome the limitation of CNNs in capturing large deformations due to local receptive fields, and the high latency of Transformers, we design M4Net as the detection backbone. By pioneering the Vision Mamba architecture in this domain, M4Net leverages selective state-space modeling to achieve global contextual modeling comparable to Transformers but with linear complexity. It efficiently captures the flexible morphology of fish, all while maintaining a lightweight footprint. Furthermore, to counteract adverse underwater conditions, we integrate an optional UIE module that adaptively enhances imagery, synergistically improving detection robustness without relying on computationally expensive appearance-based re-identification. Experimental validation on the challenging BrackishMOT benchmark shows that M4FT sets a new state-of-the-art, achieving the highest HOTA of 29.2 while incurring only ~10% of the computational cost of mainstream models.

Updates

  • 07.Jan.26 Revised version is complete, and the project homepage is now online.
  • 27.Feb.25 We have released the public repo with related resources.
M4FT Pipeline
The architecture of M4FT, featuring M4Net and the optional UIE module.

Key Contributions

Lightweight & Efficient

M4FT is a lightweight online baseline designed for low-light underwater scenes. It eliminates dependency on complex appearance features, enabling efficient online tracking.

~10% computation cost of mainstream models

M4Net Architecture

A specialized lightweight detection network specifically designed for fish. It embeds a selective scan module to support global detection while maintaining a compact architecture.

Optional UIE Module

An optional module designed to boost tracking performance across various low-visibility underwater conditions and reduce overall training costs by bypassing appearance-based Re-ID.

SOTA Performance

Experimental validation on the BrackishMOT benchmark shows that M4FT outperforms other advanced methods, achieving the highest HOTA of 29.2.

Comparisons on BrackishMOT-M4FT

Method Params ↓ GFLOPs ↓ HOTA ↑ MOTA ↑ IDF1 ↑ DetA ↑ AssA ↑ IDs ↓
SORT 25.28 207.35 22.6 25.4 30.9 20.1 25.6 164
ByteTrack 25.28 207.35 28.4 37.8 42.7 27.0 30.0 129
OC-SORT 25.28 207.35 24.0 26.1 32.0 20.7 28.2 138
HybridSORT 25.28 207.35 12.0 7.4 13.0 6.0 24.1 86
M4FT (Ours) 10.60 79.62 29.2 42.8 43.1 35.6 24.2 204
* The table above shows a subset of metrics. Params and GFLOPs are efficiency metrics (lower is better). HOTA is the primary tracking metric (higher is better). More results are given in the main manuscript.
Visual Results
Visual comparison of tracking results.

Parameter Sensitivity Analysis

To evaluate the robustness of our tracking framework and provide justification for our choice of hyperparameters, we conducted a sensitivity analysis on the key thresholds that govern the tracking process. We analyze three parameters: the IoU matching threshold (β), the high-confidence detection threshold (γ), and a post-processing evaluation threshold (α).

Phase 1 IoU Matching Threshold (β)

HOTA vs Beta

(a) HOTA (↑)

MOTA vs Beta

(b) MOTA (↑)

IDF1 vs Beta

(c) IDF1 (↑)

We first analyze the impact of the IoU threshold (β) used for associating detections with tracklets. A higher β imposes a stricter spatial constraint for a match to be considered valid. The results show that performance is stable across a range of values, with a peak near β = 0.9.

Phase 2 High-confidence Detection Threshold (γ)

HOTA vs Gamma

(a) HOTA (↑)

MOTA vs Gamma

(b) MOTA (↑)

IDF1 vs Gamma

(c) IDF1 (↑)

Next, we investigate the high-confidence detection threshold (γ), which corresponds to τhigh in our association logic. This threshold determines which detections are considered reliable for the first matching stage. Results show that performance is optimal around γ = 0.6. Setting the threshold too high (e.g., >0.8) causes a sharp decline, as too many valid, lower-confidence detections (e.g., from occluded fish) are prematurely discarded.

Post-Process Evaluation Threshold (α)

MOTA/IDF1 vs Alpha

(a) MOTA (↑) and IDF1 (↑)

ID Switch vs Alpha

(b) IDs (↓) and Frag (↓)

Finally, we examine the effect of an evaluation-only confidence threshold (α), which is a post-processing filter. While HOTA is unaffected by this parameter, other metrics are sensitive to it. We observe stable performance around α = 0.5.

Conclusion: This analysis demonstrates that our method is not overly sensitive to the precise choice of hyperparameters. For all our main experiments, we use a fixed set of default values (β=0.9, γ=0.6, α=0.5) without per-sequence tuning.

Installation

# Step 1: Clone repo
cd {Repo_ROOT}

# Step 2: Install dependencies
# Python 3.10 & PyTorch 2.0.0 recommended
conda env create -f requirements.yaml
conda activate M4FT

# Step 3 (Optional): Data Generation
# Clone CycleGAN repo and follow implementation
# BrackishMOT-M4FT includes generated data

Experiments

# Train
python3 tools/train.py -f exps/example/mot/M4FT_exps.py --fp16 -o

# Test
# Recommended: --track_thresh 0.4
python3 tools/track.py \
  -f exps/example/mot/M4FT_exps.py \
  -c ../pretrained/M4Net.pth.tar \
  --fuse

Downloads

BrackishMOT-M4FT Dataset

The dataset has been released on GitHub.

GitHub | MFT_DATASETS

Pretrained Models

Available on Google Drive and Baidu Yun.

Contact