We present MolRex, a reinforcement learning framework that combines Group Relative Policy Optimization (GRPO) with chain-of-thought fine-tuning of large language models (LLMs) to improve molecular structures through guided reasoning. MolRex trains models to propose chemically valid structural edits along with interpretable rationales, optimizing responses based on a composite reward signal that includes synthesizability, drug-likeness, human-aligned molecular preferences, and format validity. While additional metrics such as reasoning brevity are implemented for future integration, current training prioritizes chemically meaningful and syntactically robust outputs. By leveraging relative comparisons between candidate generations instead of absolute value estimation, MolRex facilitates stable training and avoids the complexity of critic networks. Experimental results show that MolRex enhances molecular properties while offering transparent rationales, making it a promising step toward interpretable, reasoning-augmented molecular design.

Uploaded model

Developed by: Xilabs
License: apache-2.0
Finetuned from model : unsloth/phi-4-bnb-4bit

This model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xilabs/MolREX

Base model

microsoft/phi-4

Quantized

unsloth/phi-4-bnb-4bit

Finetuned

(69)

this model