YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Experiment 1 – Latin Square 1: CCT5 & COME on MCMD

This repository contains the artifacts for Latin Square 1 of Experiment 1, which corresponds to the reproduction of the original experiment by Wu et al. (2025) on the MCMD dataset using the DNN-based commit message generation baselines CCT5 and COME.


Models

CCT5

CCT5 is a code-change-oriented pre-trained model built on top of the T5 architecture, initialized from CodeT5 weights. It is further specialized through pre-training on CodeChangeNet, a commit-diff dataset containing roughly 40GB of diff and commit message pairs (~1.5M pairs). It was released at ESEC/FSE 2023.

  • Base: T5-base β†’ CodeT5 β†’ CCT5
  • Pre-training data: CodeChangeNet (40GB, 1.5M diff/commit pairs)
  • For MCMD: reused released checkpoint fine-tuned on MCMD by original authors

COME

COME (Commit Message Generation with Modification Embedding) is a hybrid DNN approach that combines:

  • A fine-tuned CodeT5 component for natural language generation
  • Modification embedding to represent code changes as numerical vectors
  • An SVM-based decision algorithm to select between generated and retrieved candidate messages

It does not perform additional large-scale pre-training on top of CodeT5. Released at ISSTA 2023.

  • For MCMD: reused language-specific checkpoints released by original COME authors (one per language)

Dataset

MCMD – Multilingual Commit Message Dataset

Property Details
Languages Java, C++, C#, Python, JavaScript
Repositories Top 100 most-starred GitHub repos per language (500 total)
Total commits ~1,094,115
Date range Up to January 1st, 2022
Split 80% train / 10% validation / 10% test
Authors Liu et al. (2020)

Repository Structure

Each run folder corresponds to a programming language evaluated in this Latin Square:

experiment1_ls1/
β”œβ”€β”€ run_java/
β”‚   β”œβ”€β”€ checkpoint/          # CCT5 and COME checkpoints fine-tuned on MCMD (Java)
β”‚   β”œβ”€β”€ predictions/         # Generated commit messages on MCMD Java test set
β”‚   └── metrics/             # BLEU, METEOR, ROUGE-L, CIDEr scores
β”œβ”€β”€ run_cpp/
β”‚   β”œβ”€β”€ checkpoint/
β”‚   β”œβ”€β”€ predictions/
β”‚   └── metrics/
β”œβ”€β”€ run_csharp/
β”‚   β”œβ”€β”€ checkpoint/
β”‚   β”œβ”€β”€ predictions/
β”‚   └── metrics/
β”œβ”€β”€ run_python/
β”‚   β”œβ”€β”€ checkpoint/
β”‚   β”œβ”€β”€ predictions/
β”‚   └── metrics/
└── run_javascript/
    β”œβ”€β”€ checkpoint/
    β”œβ”€β”€ predictions/
    └── metrics/

checkpoint/

Contains the model checkpoint files for CCT5 and COME reused from the original authors' repositories, fine-tuned on the MCMD training set for the corresponding language.

predictions/

Contains the generated commit messages produced by each model on the MCMD test set for the corresponding language, stored as .txt files with one prediction per line aligned to the reference messages.

metrics/

Contains the computed evaluation metric scores for each model-language combination. Metrics are calculated by comparing predictions against the reference messages in the MCMD test set.


Evaluation Metrics

Metric Description
BLEU Bilingual Evaluation Understudy β€” measures n-gram precision between generated and reference messages
METEOR Metric for Evaluation of Translation with Explicit Ordering β€” extends BLEU with recall, stemming, and synonym matching
ROUGE-L Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β€” measures longest common subsequence overlap
CIDEr Consensus-based Image Description Evaluation β€” TF-IDF-weighted n-gram similarity against reference messages

Reported Results (Original Paper – Wu et al., 2025)

Language Model BLEU METEOR ROUGE-L CIDEr
Java CCT5 17.19 14.95 26.08 1.06
Java COME 27.17 23.36 34.59 1.90
C++ CCT5 15.65 14.11 24.15 0.90
C++ COME 27.29 23.29 33.33 1.91
C# CCT5 12.06 11.05 18.92 0.61
C# COME 20.80 17.72 27.01 1.25
Python CCT5 15.12 13.70 23.79 0.85
Python COME 23.17 19.99 30.48 1.50
JavaScript CCT5 19.76 17.51 28.73 1.33
JavaScript COME 26.91 23.02 34.44 1.92
Average CCT5 15.96 14.26 24.33 0.95
Average COME 25.07 21.48 31.97 1.70

References

  • Wu et al. (2025). An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning. arXiv:2502.18904.
  • Lin et al. (2023). CCT5: A Code-Change-Oriented Pre-Trained Model. ESEC/FSE 2023.
  • He et al. (2023). COME: Commit Message Generation with Modification Embedding. ISSTA 2023.
  • Liu et al. (2020). MCMD dataset.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for clouds125/TFM_EXP1_MCMD_LS1