Abhisek Behera's picture

Abhisek Behera PRO

Abhisek987

AI & ML interests

None yet

Recent Activity

repliedto their post 2 days ago

Every Python developer has hit this: you upgrade numpy or pandas, and code that worked yesterday breaks today. I built an open dataset for exactly this problem. DepDoctor is 6,204 examples of Python code broken by a dependency upgrade, each paired with the fix and a short note on the API change that caused it. It is a mixture of real cases mined from public GitHub commits and synthetic cases generated from a database of known breaking changes. A few things I tried to get right: - 935 "leave it alone" examples, to teach a model restraint, not just what to change. - Honest evaluation: a fine-tuned Qwen2.5-Coder-7B gets 62% of fixes fully correct. I report that, not just the 97% text-similarity score that hides the truth. - The main failure mode, over-editing, is measured and explained rather than buried. Dataset, fine-tuned model, and a live demo are all open in one place: https://huggingface.co/collections/Abhisek987/depdoctor Feedback welcome, especially from anyone working on code repair or API migration.

posted an update 3 days ago

Every Python developer has hit this: you upgrade numpy or pandas, and code that worked yesterday breaks today. I built an open dataset for exactly this problem. DepDoctor is 6,204 examples of Python code broken by a dependency upgrade, each paired with the fix and a short note on the API change that caused it. It is a mixture of real cases mined from public GitHub commits and synthetic cases generated from a database of known breaking changes. A few things I tried to get right: - 935 "leave it alone" examples, to teach a model restraint, not just what to change. - Honest evaluation: a fine-tuned Qwen2.5-Coder-7B gets 62% of fixes fully correct. I report that, not just the 97% text-similarity score that hides the truth. - The main failure mode, over-editing, is measured and explained rather than buried. Dataset, fine-tuned model, and a live demo are all open in one place: https://huggingface.co/collections/Abhisek987/depdoctor Feedback welcome, especially from anyone working on code repair or API migration.

updated a dataset 3 days ago

Abhisek987/depdoctor-dataset

View all activity

Organizations

None yet

replied to their post 2 days ago

Good question. It is for training and evaluating models that fix Python code after a dependency upgrade breaks it. If you have ever bumped numpy or pandas and watched old code stop working, this is the data to teach a model to make those fixes automatically. Each example pairs the broken code with the fix and a short note on the API change that caused it. There is a live demo in the collection if you want to see it fix real code.

posted an update 3 days ago

Post

135

Every Python developer has hit this: you upgrade numpy or pandas, and code that worked yesterday breaks today.

I built an open dataset for exactly this problem. DepDoctor is 6,204 examples of Python code broken by a dependency upgrade, each paired with the fix and a short note on the API change that caused it. It is a mixture of real cases mined from public GitHub commits and synthetic cases generated from a database of known breaking changes.

A few things I tried to get right:
- 935 "leave it alone" examples, to teach a model restraint, not just what to change.
- Honest evaluation: a fine-tuned Qwen2.5-Coder-7B gets 62% of fixes fully correct. I report that, not just the 97% text-similarity score that hides the truth.
- The main failure mode, over-editing, is measured and explained rather than buried.

Dataset, fine-tuned model, and a live demo are all open in one place:
https://huggingface.co/collections/Abhisek987/depdoctor

Feedback welcome, especially from anyone working on code repair or API migration.

2 replies

·