Open to Collab

18 70 48

Mohammed Hamdy

mmhamdy

https://surfingmanifolds.substack.com/

AI & ML interests

AI4Sci | NLP | Reinforcement Learning

Recent Activity

repliedto their post 3 days ago

Things rarely go as we expect! In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create! The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel. Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All. In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need! But history had other plans. The building block eclipsed the grand design! So, have you heard about the MultiModel before? 😀

posted an update 3 days ago

posted an update 5 months ago

The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable length😄 Here's a nice short summary from Gemini

View all activity

Organizations

replied to their post 3 days ago

The MultiModel paper: https://arxiv.org/abs/1706.05137
The story behind Attention Is All You Need: https://surfingmanifolds.substack.com/p/pandemonium-the-transformers-story

posted an update 3 days ago

Post

Things rarely go as we expect!

In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!

The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.

Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.

In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!

But history had other plans. The building block eclipsed the grand design!

So, have you heard about the MultiModel before? 😀

1 reply

posted an update 5 months ago

Post

3179

upvoted an article 6 months ago

Article

Continuous batching from first principles

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 397

reacted to Kseniase's post with ❤️ 7 months ago

Post

6498

12 Types of JEPA

Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:

1. LeJEPA → LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the “ideal” JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA

2. JEPA-T → JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text

3. Text-JEPA → Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs

4. N-JEPA (Noise-based JEPA) → Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification

5. SparseJEPA → SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy

6. TS-JEPA (Time Series JEPA) → Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders

Read further below ↓
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

liked a Space 7 months ago

Unlocking On-Policy Distillation for Any Model Family

📝

108

Visualize on-policy distillation for any model family

upvoted 2 articles 7 months ago

Article

Promoter-GPT: Writing DNA Instructions with Language Models

hugging-science

•

Oct 22, 2025

• 25

Article

Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models

nvidia

•

Oct 20, 2025

• 19

liked a dataset 8 months ago

transferable-samplers/many-peptides-md

Updated Dec 15, 2025 • 26.9k • 8

published an article 8 months ago

Article

The Next Frontier: Large Language Models In Biology

mmhamdy

•

Oct 12, 2025

• 5

liked 3 Spaces 8 months ago

Science Release Heatmap

🔥

Explore AI4Science contributions by organizations and tags

Maintain the unmaintainable

📚

Explore the complex relationships between 400+ machine learning models

Transformers Timeline

🤗

Interactive timeline to explore the 🤗Transformers models

published a Space 8 months ago

BioLLM Story

🌖

Create and render documents with code and text using Quarto

reacted to AdinaY's post with 🔥 8 months ago

Post

3544

BAAI has released ROME🔥 evaluating 30+ large reasoning models on text & visual reasoning

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)

✨Tests visual reasoning, not just recognition
✨Covers capability × alignment × safety × efficiency
✨More transparent & reliable (less data contamination)
✨Helps make real-world deployment choices