Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
ginigen-ai
PRO
ginigen-ai
2
12
138
Follow
ARUNAGIRINATHAN's profile picture
fushexin's profile picture
Warecube's profile picture
39 followers
ยท
125 following
AI & ML interests
None yet
Recent Activity
replied
to
their
post
about 11 hours ago
๐ง Does your LLM know when it's about to be wrong? Most leaderboards measure accuracy. We measure metacognition โ whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. ๐ The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 โ ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal. Two independent axes (never compared across a row): โ trap_rate โ does it fall for tempting trap options? (lower = stronger) โก adapter gain ฮ โ how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value) What's open: ๐ 300+100 trap problems (each with a hidden trap + TICOS type) ๐ 24-model leaderboard ๐งฉ 11 per-model adapters โ adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state โ P(wrong)) Submit any HF model โ auto-scored daily at 09:00 KST and added to the board. ๐ Leaderboard โ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space ๐ Benchmark โ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench ๐งฉ Adapters โ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961 ๐ Article โ https://huggingface.co/blog/ginigen-ai/metacognition Benchmark by ginigen-ai ยท Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).
liked
a model
about 20 hours ago
FINAL-Bench/metacog-adapter-JGOS-31B-Citizen
updated
a Space
about 20 hours ago
ginigen-ai/Metacognition-Leaderboard-Space
View all activity
Organizations
None yet
ginigen-ai
's datasets
2
Sort:ย Recently updated
ginigen-ai/Metacognition-Bench
Viewer
โข
Updated
3 days ago
โข
300
โข
115
โข
21
ginigen-ai/smol-worldcup
Viewer
โข
Updated
Mar 10
โข
125
โข
341
โข
47