SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker
Abstract
SEATrack presents a novel multimodal tracking approach that improves performance-efficiency balance through cross-modal alignment and adaptive fusion mechanisms.
Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream multimodal tracker that tackles this performance-efficiency dilemma from two complementary perspectives. We first prioritize cross-modal alignment of matching responses, an underexplored yet pivotal factor that we argue is essential for breaking the trade-off. Specifically, we observe that modality-specific biases in existing two-stream methods generate conflicting matching attention maps, thereby hindering effective joint representation learning. To mitigate this, we propose AMG-LoRA, which seamlessly integrates Low-Rank Adaptation (LoRA) for domain adaptation with Adaptive Mutual Guidance (AMG) to dynamically refine and align attention maps across modalities. We then depart from conventional local fusion approaches by introducing a Hierarchical Mixture of Experts (HMoE) that enables efficient global relation modeling, effectively balancing expressiveness and computational efficiency in cross-modal fusion. Equipped with these innovations, SEATrack advances notable progress over state-of-the-art methods in balancing performance with efficiency across RGB-T, RGB-D, and RGB-E tracking tasks. https://github.com/AutoLab-SAI-SJTU/SEATrack{cyan{Code is available}}.
Community
Motivated by the spatio-temporal consistency of multimodal inputs, SEATrack adopts an align-before-fusion strategy: it first aligns the spatial perception of different modality branches, and then performs information fusion.
If you find this idea insightful, we would greatly appreciate your support and upvote! πΏπ
SEATrack is open-sourced at https://github.com/AutoLab-SAI-SJTU/SEATrack.
Get this paper in your agent:
hf papers read 2604.12502 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper