Instructions to use geetu040/Timer-S1-quantized-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use geetu040/Timer-S1-quantized-4bit with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("geetu040/Timer-S1-quantized-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Timer-S1 Quantized 4-bit
This repository contains an unofficial 4-bit BitsAndBytes quantized checkpoint derived from bytedance-research/Timer-S1.
Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with 8.3B total parameters, 0.75B activated parameters per token, and a context length of 11,520. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the Timer-S1 technical report.
This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference.
Source and Provenance
- Base model:
bytedance-research/Timer-S1 - Quantization: BitsAndBytes 4-bit quantization
- Status: unofficial derivative checkpoint
No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized.
Quantization Details
The checkpoint configuration records the following quantization settings:
{
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes",
"bnb_4bit_quant_type": "fp4",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_use_double_quant": false
}
The model config also sets use_cache to true, matching the local quantized checkpoint. For lower memory usage during generation, set model.config.use_cache = False after loading the model.
Quickstart
Install the expected runtime dependencies:
pip install torch accelerate bitsandbytes "transformers~=4.57.1"
Load the model with Hugging Face Transformers:
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"geetu040/Timer-S1-quantized-4bit",
trust_remote_code=True,
device_map="auto",
)
# Optional: reduce generation memory usage by disabling the KV cache.
# This can be useful on smaller GPUs or for longer lookback windows.
model.config.use_cache = False
batch_size, lookback_length = 1, 2880
seqs = torch.randn(batch_size, lookback_length).to(model.device)
forecast_length = 256
output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)
# Timer-S1 generates forecasts at quantile levels:
# [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape) # batch_size x quantile_num(9) x forecast_length
print(output[0][4]) # median forecast for the first sample
Specification
- Architecture: decoder-only Transformer with MoE
- Context length: up to 11,520
- Patch length: 16
- Quantiles: 0.1 through 0.9
- Hidden size: 1024
- Attention heads: 16
- Experts: 32 total, 2 selected per token
- Hidden layers: 24
- Weight format:
model.safetensors - Quantization: BitsAndBytes 4-bit FP4
License
The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata.
Citation
If you use this quantized checkpoint, cite the original Timer-S1 paper:
@article{liu2026timer,
title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
journal={arXiv preprint arXiv:2603.04791},
year={2026}
}
- Downloads last month
- 45
Model tree for geetu040/Timer-S1-quantized-4bit
Base model
bytedance-research/Timer-S1