Papers
arxiv:2604.12502

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

Published on Apr 14
Authors:
,
,
,
,

Abstract

SEATrack presents a novel multimodal tracking approach that improves performance-efficiency balance through cross-modal alignment and adaptive fusion mechanisms.

AI-generated summary

Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream multimodal tracker that tackles this performance-efficiency dilemma from two complementary perspectives. We first prioritize cross-modal alignment of matching responses, an underexplored yet pivotal factor that we argue is essential for breaking the trade-off. Specifically, we observe that modality-specific biases in existing two-stream methods generate conflicting matching attention maps, thereby hindering effective joint representation learning. To mitigate this, we propose AMG-LoRA, which seamlessly integrates Low-Rank Adaptation (LoRA) for domain adaptation with Adaptive Mutual Guidance (AMG) to dynamically refine and align attention maps across modalities. We then depart from conventional local fusion approaches by introducing a Hierarchical Mixture of Experts (HMoE) that enables efficient global relation modeling, effectively balancing expressiveness and computational efficiency in cross-modal fusion. Equipped with these innovations, SEATrack advances notable progress over state-of-the-art methods in balancing performance with efficiency across RGB-T, RGB-D, and RGB-E tracking tasks. https://github.com/AutoLab-SAI-SJTU/SEATrack{cyan{Code is available}}.

Community

Paper author

Motivated by the spatio-temporal consistency of multimodal inputs, SEATrack adopts an align-before-fusion strategy: it first aligns the spatial perception of different modality branches, and then performs information fusion.

If you find this idea insightful, we would greatly appreciate your support and upvote! πŸŒΏπŸš€

SEATrack is open-sourced at https://github.com/AutoLab-SAI-SJTU/SEATrack.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.12502
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.12502 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.12502 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.