Stockmark-2-100B-Instruct

image/jpeg

Model description

Stockmark-2-100B-Instruct is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version (Stockmark-2-100B-Instruct-beta).

This project was supported by GENIAC.

Features

  • Model Type: Causal Language Model
  • Number of Parameters: 96B
  • Number of Layers: 86
  • Number of Attention Heads (GQA): 72 for Q and 8 for KV
  • Context Length: 32k
  • Supported Languages: Japanese and English

Model performance

Japanese MT-bench

Model Average coding extraction humanities math reasoning roleplay stem
Stockmark-2-100B-Instruct 7.87 7.07 8.35 8.73 7.57 5.45 8.65 8.33
Stockmark-2-100B-Instruct-beta 7.71 6.73 8.23 8.63 7.01 5.85 8.54 8.07

How to use

transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "stockmark/Stockmark-2-100B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="bfloat16")

instruction = "่‡ช็„ถ่จ€่ชžๅ‡ฆ็†ใจใฏ๏ผŸ"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": instruction}],
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.inference_mode():
  tokens = model.generate(
    input_ids,
    max_new_tokens = 512,
    do_sample = True,
    temperature = 0.7,
    top_p = 0.95
  )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)

vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="stockmark/Stockmark-2-100B-Instruct",
    tensor_parallel_size=4,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.95,
    max_tokens=512
)

conversation = [{"role": "user", "content": "่‡ช็„ถ่จ€่ชžๅ‡ฆ็†ใจใฏ๏ผŸ"}]

outputs = llm.chat(conversation, sampling_params=sampling_params)

for output in outputs:
    generated_text = output.outputs[0].text
    print(generated_text)

Libraries used for training

License

MIT

Developed by

Stockmark Inc.

Downloads last month
30
Safetensors
Model size
96B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for stockmark/Stockmark-2-100B-Instruct

Quantizations
1 model

Space using stockmark/Stockmark-2-100B-Instruct 1

Collection including stockmark/Stockmark-2-100B-Instruct