Timer-S1 Quantized 4-bit

This repository contains an unofficial 4-bit BitsAndBytes quantized checkpoint derived from bytedance-research/Timer-S1.

Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with 8.3B total parameters, 0.75B activated parameters per token, and a context length of 11,520. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the Timer-S1 technical report.

This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference.

Source and Provenance

Base model: bytedance-research/Timer-S1
Quantization: BitsAndBytes 4-bit quantization
Status: unofficial derivative checkpoint

No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized.

Quantization Details

The checkpoint configuration records the following quantization settings:

{
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes",
  "bnb_4bit_quant_type": "fp4",
  "bnb_4bit_quant_storage": "uint8",
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_use_double_quant": false
}

The model config also sets use_cache to true, matching the local quantized checkpoint. For lower memory usage during generation, set model.config.use_cache = False after loading the model.

Quickstart

Install the expected runtime dependencies:

pip install torch accelerate bitsandbytes "transformers~=4.57.1"

Load the model with Hugging Face Transformers:

import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "geetu040/Timer-S1-quantized-4bit",
    trust_remote_code=True,
    device_map="auto",
)

# Optional: reduce generation memory usage by disabling the KV cache.
# This can be useful on smaller GPUs or for longer lookback windows.
model.config.use_cache = False

batch_size, lookback_length = 1, 2880
seqs = torch.randn(batch_size, lookback_length).to(model.device)

forecast_length = 256
output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)

# Timer-S1 generates forecasts at quantile levels:
# [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape)  # batch_size x quantile_num(9) x forecast_length
print(output[0][4])  # median forecast for the first sample

Specification

Architecture: decoder-only Transformer with MoE
Context length: up to 11,520
Patch length: 16
Quantiles: 0.1 through 0.9
Hidden size: 1024
Attention heads: 16
Experts: 32 total, 2 selected per token
Hidden layers: 24
Weight format: model.safetensors
Quantization: BitsAndBytes 4-bit FP4

License

The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata.

Citation

If you use this quantized checkpoint, cite the original Timer-S1 paper:

@article{liu2026timer,
  title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
  author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
  journal={arXiv preprint arXiv:2603.04791},
  year={2026}
}

Downloads last month: 45

Safetensors

Model size

8B params

Tensor type

F32

BF16

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for geetu040/Timer-S1-quantized-4bit

Base model

bytedance-research/Timer-S1

Quantized

(1)

this model

Datasets used to train geetu040/Timer-S1-quantized-4bit

Paper for geetu040/Timer-S1-quantized-4bit

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Paper • 2603.04791 • Published Mar 5 • 21