LycheeMem Reranker v1

A LoRA adapter on Qwen3-Reranker-0.6B for long-term memory dialog retrieval. Built for LycheeMem.

Evaluation

MAP (mean average precision), higher is better. Queries strictly held out from training.

Model	LongMemEval-S (373 q, held-out)	MSC-MemFuse-MC10 (27 q, held-out)	HotpotQA distractor (7,405 q, OOD)
LycheeMem Reranker v1	0.9185	0.7457	0.7063
BGE-Reranker-v2-m3 (560M)	0.8647	0.5503	0.8002
Δ	+5.4 pp	+19.5 pp	−9.4 pp

Full metrics:

Benchmark	hit@10	R@5	R@10	MAP	NDCG@10
LongMemEval-S held-out	1.000	0.964	0.988	0.919	0.940
MSC-MemFuse-MC10 held-out	1.000	0.799	0.896	0.746	0.786
HotpotQA distractor (OOD)	0.987	0.793	0.890	0.706	0.769

Use this model for memory dialog reranking. Use BGE-Reranker-v2-m3 for general retrieval.

Training

Source	Queries	Pairs
LongMemEval-S (cleaned-overlap)	127	6,018
MSC-MemFuse-MC10 (answer-turn)	299	14,950
Total	426	20,968

90/10 query-stratified split: 18,947 train / 2,021 held-out.

Labels are 5-level continuous ({0.0, 0.2, 0.4, 0.6, 0.8, 1.0}) distilled from DeepSeek V4 Pro on the mid-tier candidates retrieved by the upstream retriever. Trained with LoRA r=16, BCE-with-logits against continuous targets, 3 epochs.

Usage

import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification

BASE = "Qwen/Qwen3-Reranker-0.6B"
ADAPTER = "fuhao23/reranker_v1"

tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForSequenceClassification.from_pretrained(
    BASE, num_labels=1, torch_dtype=torch.bfloat16, trust_remote_code=True
)
base.config.pad_token_id = tok.pad_token_id
model = PeftModel.from_pretrained(base, ADAPTER).eval().to("cuda")

INSTRUCT = "Given a user query, retrieve memory snippets that answer the query"

def score(query: str, candidates: list[str], max_len: int = 512) -> list[float]:
    texts = [
        f"<Instruct>: {INSTRUCT}\n<Query>: {query}\n<Document>: {c}"
        for c in candidates
    ]
    enc = tok(texts, padding=True, truncation=True, max_length=max_len,
              return_tensors="pt").to(model.device)
    with torch.inference_mode():
        logits = model(**enc).logits.squeeze(-1).float().cpu().tolist()
    return logits

Apply torch.sigmoid for normalized probabilities.

Limitations

Specialized for dialog memory; trails BGE-Reranker-v2-m3 on out-of-domain retrieval.
English-only training distribution.
MSC-MemFuse-MC10 held-out is small (27 queries); LongMemEval-S held-out (373) is the primary in-domain reference.
Continuous labels are LLM-distilled (DeepSeek V4 Pro), not human-annotated.
Reports retrieval-stage metrics; end-to-end answer accuracy with this reranker integrated is not reported here.

Citation

@misc{lycheemem_reranker_v1,
  title  = {LycheeMem Reranker v1: A Domain-Specialized Reranker for Long-Term Memory Dialog Retrieval},
  author = {LycheeMem Project},
  year   = {2026},
  url    = {https://huggingface.co/fuhao23/reranker_v1}
}

License

Apache 2.0.

Downloads last month: 19

Model tree for fuhao23/reranker_v1

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Reranker-0.6B

Adapter

(2)

this model