3260 papers • 126 benchmarks • 313 datasets
Code translation is the process of converting code written in one programming language to another programming language while maintaining the same functionality. This process is also known as code conversion, source-to-source translation, or transpilation. Code translation is often performed when developers want to take advantage of new programming languages, improve code performance, or maintain legacy systems. Some common examples include translating code from Python to Java, or from JavaScript to TypeScript.
(Image credit: Papersgraph)
These leaderboards are used to track progress in code-translation-6
Use these libraries to find code-translation-6 models and implementations
No subtasks available.
A fully unsupervised neural transcompiler that relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages is proposed.
Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL.
This paper introduces CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation that includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.
It is proved that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization of prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation.
A new pre-training objective, DOBF, is introduced that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code and shows that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks.
This work introduces a new automatic evaluation metric, dubbed CodeBLEU, which absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow and can achieve a better correlation with programmer assigned scores compared with BLEu and accuracy.
The paper highlights the importance of the lexical substitution component in the current natural language to code systems with a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided.
This work presents new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingUAL models over mono-lingual, and the ability of few-shot prompting to teach the model new languages.
This work proposes, Code Attack, a simple yet effective black-box attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks.
A state-of-the-art translation model used to generate Bash Commands from the corresponding English text is described and a new NL2CMD dataset is introduced that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets.
Adding a benchmark result helps the community track progress.