We release a CodeT5-base fine-tuned checkpoint ([Salesforce/codet5-base-multi-sum](https://huggingface.co/Salesforce/codet5-base-multi-sum)) for multi-lingual code summarzation. Below is how to use this model:
```python
from transformers import RobertaTokenizer, T5ForConditionalGeneration
# this prints: "Convert a SVG string to a QImage."
```
It significantly outperforms previous methods on code summarization in the [CodeXGLUE benchmark](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text):
| Model | Ruby | Javascript | Go | Python | Java | PHP | Overall |
We add a [model card](https://github.com/salesforce/CodeT5/blob/main/CodeT5_model_card.pdf) for CodeT5! Please reach out if you have any questions about it.
CodeT5 is now in [hugginface](https://huggingface.co/)!
You can simply load the model ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and do the inference:
```python
from transformers import RobertaTokenizer, T5ForConditionalGeneration
This repo provides the code for reproducing the experiments in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf).
CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M** functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#).
In total, it achieves state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small) and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tine them on 4 generation tasks (code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and clone detection) in CodeXGLUE.
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
The code is released under the BSD-3 License (see `LICENSE.txt` for details), but we also ask that users respect the following:
This software should not be used to promote or profit from:
violence, hate, and division,
environmental destruction,
abuse of human rights, or
the destruction of people's physical and mental health.
We encourage users of this software to tell us about the applications in which they are putting it to use by emailing codeT5@salesforce.com, and to use [appropriate](https://arxiv.org/abs/1810.03993) [documentation](https://www.partnershiponai.org/about-ml/) when developing high-stakes applications of this model.
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your downloaded CodeT5 repository path.
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task` arguments.
In total, we support four models (i.e., ['roberta', 'codebert', 'codet5_small', 'codet5_base']) and six tasks (i.e., ['summarize', 'concode', 'translate', 'refine', 'defect', 'clone']).
For each task, we use the `sub_task` to specify which specific datasets to fine-tine on.
You can also revise the suggested arguments [here](https://github.com/salesforce/CodeT5/blob/4f8818aea1bf170f019381671087e4c4f9608005/sh/run_exp.py#L14) and refer to the argument flags in [configs.py](https://github.com/salesforce/CodeT5/blob/main/configs.py) for the full available options.