TINY-THINKER
Run benchmark evaluation and autoโsubmit answers
Used for old normal hf demo
ST-X-AI-SS