Abstract
A benchmarking framework is introduced to study data-manifold geometry by extending dSprites and COIL-20 datasets with additional transformation dimensions and dense sampling, enabling accurate estimation of curvature, reach, and volume for theoretical analysis and validation.
A significant gap exists between theory and practice in deep learning. Generalization and approximation error bounds are often derived for simplified models or are too loose to be informative. Many rely on the manifold hypothesis and on geometric regularity such as intrinsic dimension, curvature, and reach. Progress requires insight into data-manifold geometry and suitable benchmarks, yet existing options are polarized: analytic manifolds with known geometry but limited applicability, or real-world datasets where geometry is only coarsely estimable. We introduce a benchmarking framework for studying data geometry. We repurpose and extend dSprites and COIL-20 with additional transformation dimensions and dense, axis-aligned sampling, and pair them with finite-difference estimators that recover curvature, reach, and volume at near-ground-truth accuracy in a regime where general-purpose estimators are unreliable or difficult to deploy. The framework is intended as a controlled testbed, useful as a calibration environment for geometric estimators and a sandbox for probing theoretical assumptions. To illustrate its use, we present two application studies, namely assessing the scaling behavior of the bounds of Genovese et al. and Fefferman et al., and tracking the layer-wise geometry of a β-VAE, highlighting the behavior of current bounds and the value of controlled benchmarks for guiding and validating future theory. A reference implementation is available at https://github.com/koulakis/manifold-microscope.
Community
We introduce Manifold Microscope, a controlled benchmark for studying data-manifold geometry with finite-difference estimates of curvature, reach, and volume on grid-sampled image manifolds.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective (2026)
- Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves (2026)
- Bounding Global and Local Compression Error of Signal Parameterizations (2026)
- Riemannian Metric Matching for Scalable Geometric Modeling of Distributions (2026)
- Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine (2026)
- Generalization in Nonlinear Least Squares via Learned Feature Geometry (2026)
- Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
My goal is to put time into GPT - via text - using phase parameters. Here are some PDFs I made : https://gpt2pdfsite.vercel.app/
Hi @jhegedus ,
Thank you for your interest in our paper and for sharing your related work.
The comment is quite long, and unfortunately Hugging Face does not currently collapse long comments with a “see more...” button. This means the full text takes up most of the visible discussion space.
Would you mind shortening it to around 3–5 lines and adding a link to your work for readers who want more details?
If it stays in the current form, I would unfortunately need to remove it to keep the thread readable for others.
Best,
Marios
Neat paper. It feels like we’ve been stuck between toy math examples and messy, uninterpretable real-world data for a long time, so having a middle-ground testbed to calibrate geometric estimators sounds pretty useful.
I'm curious, how well do you think the curvature and reach measurements from these synthetic, axis-aligned datasets translate to the more chaotic structure of natural image manifolds?
I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/f0ee5781-7e5b-49cc-b5d4-6f2379ecd740
Get this paper in your agent:
hf papers read 2606.15760 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper