AI/CodeT5

mirror of https://github.com/salesforce/CodeT5.git synced 2024-10-01 06:35:38 -04:00

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

code-intelligence language-model nlp programming-language representation-learning

Go to file

WANG Yue 71ccd12773 reorganize the repo		2023-05-17 17:34:00 +08:00
CodeT5	reorganize the repo	2023-05-17 17:34:00 +08:00
CodeT5+	reorganize the repo	2023-05-17 17:34:00 +08:00
CODE_OF_CONDUCT.md	add code of conduct and security files	2021-09-03 22:35:47 +08:00
CODEOWNERS	Updated/Added CODEOWNERS with ECCN	2021-10-12 12:11:02 -07:00
codet5.gif	reorganize the repo	2023-05-17 17:34:00 +08:00
CodeT5_model_card.pdf	Add model card	2021-10-18 11:12:52 +08:00
LICENSE.txt	third full commit	2021-09-03 22:14:17 +08:00
README.md	reorganize the repo	2023-05-17 17:34:00 +08:00
SECURITY.md	add code of conduct and security files	2021-09-03 22:35:47 +08:00

README.md

CodeT5 and CodeT5+

Official research release for CodeT5 and CodeT5+ models for a wide range of Code Understanding and Generation tasks from Salesforce Research. These open code LLMs are introduced by the following papers:

Title: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Authors: Yue Wang, Weishi Wang , Shafiq Joty, Steven C.H. Hoi

Title: CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Authors: Yue Wang*, Hung Le*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution)

In practice, CodeT5 and CodeT5+ models can be deployed as an AI-powered coding assistant to boost the productivity of software developers. At Salesforce, we build an AI coding assistant demo using CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:

Text-to-code generation: generate code based on the natural language description.
Code autocompletion: complete the whole function of code given the target function name.
Code summarization: generate the summary of a function in natural language description.

What's New: 🎉

May 2023

CodeT5+ Paper and models released! (paper, code)

July 2022

We release two large-sized CodeT5 checkpoints at Hugging Face: Salesforce/codet5-large and Salesforce/codet5-large-ntp-py, which are introduced by the CodeRL paper.

Oct 2021

We release fine-tuned checkpoints for all the downstream tasks covered in the paper. Besides, we release a CodeT5-base fine-tuned checkpoint (Salesforce/codet5-base-multi-sum) for multilingual code summarization.

Sep, 2021

CodeT5 is now in hugginface! (CodeT5-small and CodeT5-base).

We add a model card for CodeT5! Please reach out if you have any questions about it.

Citation

If you find this code to be useful for your research, please consider citing:

@inproceedings{
    wang2021codet5,
    title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation}, 
    author={Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi},
    booktitle={EMNLP},
    year={2021},
}

@inproceedings{
    le2022coderl,
    title={CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning},
    author={Le, Hung and Wang, Yue and Gotmare, Akhilesh Deepak and Savarese, Silvio and Hoi, Steven C. H.},
    journal={NeurIPS},
    year={2022}
}

@article{
    wang2023codet5plus,
    title={CodeT5+: Open Code Large Language Models for Code Understanding and Generation},
    author={Wang, Yue and Le, Hung and Gotmare, Akhilesh Deepak and Bui, Nghi D.Q. and Li, Junnan and Hoi, Steven C. H.},
    journal={arXiv preprint},
    year={2023}
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from:

violence, hate, and division,

environmental destruction,

abuse of human rights, or

the destruction of people's physical and mental health.

We encourage users of this software to tell us about the applications in which they are putting it to use by emailing codeT5@salesforce.com, and to use appropriate documentation when developing high-stakes applications of this model.

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!