3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in code-repair-16
Use these libraries to find code-repair-16 models and implementations
No subtasks available.
This paper introduces CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation that includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.
This paper investigates the ability of large language models (LLMs) to suggest functionally correct, performance improving code edits, and hypothesizes that language models can suggest such edits in ways that would be impractical for static analysis alone.
The authors' models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks.
This work proposes a new training approach, Break-It-Fix-It (BIFI), which uses the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and trains a breaker to generate realistic bad code from good code.
This work proposes a notion of monitors that use static analysis in the background to guide the decoding of language models of code, and shows that monitor-guided decoding consistently improves the ability of an LM to not only generate identifiers that match the ground truth but also improves compilation rates and agreement with ground truth.
InterVENOR is a system designed to emulate the interactive code repair processes observed in humans, encompassing both code diagnosis and code repair, and can accurately identify syntax errors and assertion errors and provide precise instructions to repair codes.
MACER is a novel technique for accelerated error repair based on a modular segregation of the repair process into repair identification and repair application that outperforms existing methods by 20% at suggesting fixes for popular errors while being competitive or better at other errors.
Adding a benchmark result helps the community track progress.