mirror of
https://github.com/salesforce/CodeT5.git
synced 2024-10-01 06:35:38 -04:00
update table of contents
This commit is contained in:
parent
e156956bcd
commit
3b529e206d
@ -8,6 +8,16 @@ Find out more via our [blog post](https://blog.salesforceairesearch.com/codet5-o
|
|||||||
*Authors*: [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution)
|
*Authors*: [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution)
|
||||||
|
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [What is this about?](#what-is-this-about)
|
||||||
|
2. [Released Models](#released-models)
|
||||||
|
3. [How to Use?](#how-to-use)
|
||||||
|
4. [Instruction Tuning to Align with Natural Language Instructions](#instruction-tuning-to-align-with-natural-language-instructions)
|
||||||
|
5. [How to Finetune Using Your Own Data?](#how-to-finetune-using-your-own-data)
|
||||||
|
6. [Reproduce the Results](#reproduce-the-results)
|
||||||
|
7. [Citation](#citation)
|
||||||
|
|
||||||
# What is this about?
|
# What is this about?
|
||||||
CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
|
CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
|
||||||
|
|
||||||
@ -106,6 +116,7 @@ Our CodeT5+ models achieves strong results on HumanEval benchmark in zero-shot s
|
|||||||
| code-cushman-001 | 33.5 | 54.3 | 77.4 |
|
| code-cushman-001 | 33.5 | 54.3 | 77.4 |
|
||||||
| StarCoder 15B | 33.6 | - | - |
|
| StarCoder 15B | 33.6 | - | - |
|
||||||
| InstructCodeT5+ 16B | **36.1** | **57.1** | **80.7** |
|
| InstructCodeT5+ 16B | **36.1** | **57.1** | **80.7** |
|
||||||
|
|
||||||
Please follow the instructions below to reproduce the results.
|
Please follow the instructions below to reproduce the results.
|
||||||
|
|
||||||
|
|
||||||
|
@ -9,6 +9,45 @@ This is the official PyTorch implementation for the following EMNLP 2021 paper f
|
|||||||
|
|
||||||
![CodeT5 demo](../codet5.gif)
|
![CodeT5 demo](../codet5.gif)
|
||||||
|
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Introduction](#introduction)
|
||||||
|
2. [Updates](#updates)
|
||||||
|
3. [Download Pretrained and Fine-tuned Checkpoints](#download-pretrained-and-fine-tuned-checkpoints)
|
||||||
|
4. [Fine-tuning](#fine-tuning)
|
||||||
|
1. [How to run?](#how-to-run)
|
||||||
|
2. [How to reproduce the results using the released finetuned checkpoints?](#how-to-reproduce-the-results-using-the-released-finetuned-checkpoints)
|
||||||
|
3. [How to fine-tune on your own task and dataset?](#how-to-fine-tune-on-your-own-task-and-dataset)
|
||||||
|
5. [Citation](#citation)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
This repo provides the code for reproducing the experiments
|
||||||
|
in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
|
||||||
|
. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
|
||||||
|
functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
|
||||||
|
state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
|
||||||
|
|
||||||
|
Paper link: https://arxiv.org/abs/2109.00859
|
||||||
|
|
||||||
|
Blog link: https://blog.salesforceairesearch.com/codet5/
|
||||||
|
|
||||||
|
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
|
||||||
|
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
|
||||||
|
code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
|
||||||
|
clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
|
||||||
|
of our paper.
|
||||||
|
|
||||||
|
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
|
||||||
|
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
|
||||||
|
CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
|
||||||
|
|
||||||
|
- **Text-to-code generation**: generate code based on the natural language description.
|
||||||
|
- **Code autocompletion**: complete the whole function of code given the target function name.
|
||||||
|
- **Code summarization**: generate the summary of a function in natural language description.
|
||||||
|
|
||||||
|
|
||||||
## Updates
|
## Updates
|
||||||
|
|
||||||
**July 06, 2022**
|
**July 06, 2022**
|
||||||
@ -86,46 +125,8 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
|
|||||||
# this prints "{user.username}"
|
# this prints "{user.username}"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Introduction
|
|
||||||
|
|
||||||
This repo provides the code for reproducing the experiments
|
## Download Pretrained and Fine-tuned Checkpoints
|
||||||
in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
|
|
||||||
. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
|
|
||||||
functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
|
|
||||||
state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
|
|
||||||
|
|
||||||
Paper link: https://arxiv.org/abs/2109.00859
|
|
||||||
|
|
||||||
Blog link: https://blog.salesforceairesearch.com/codet5/
|
|
||||||
|
|
||||||
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
|
|
||||||
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
|
|
||||||
code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
|
|
||||||
clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
|
|
||||||
of our paper.
|
|
||||||
|
|
||||||
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
|
|
||||||
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
|
|
||||||
CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
|
|
||||||
|
|
||||||
- **Text-to-code generation**: generate code based on the natural language description.
|
|
||||||
- **Code autocompletion**: complete the whole function of code given the target function name.
|
|
||||||
- **Code summarization**: generate the summary of a function in natural language description.
|
|
||||||
|
|
||||||
## Table of Contents
|
|
||||||
|
|
||||||
1. [Dependency](#dependency)
|
|
||||||
2. [Download](#download)
|
|
||||||
3. [Fine-tuning](#fine-tuning)
|
|
||||||
|
|
||||||
## Dependency
|
|
||||||
|
|
||||||
- Pytorch 1.7.1
|
|
||||||
- tensorboard 2.4.1
|
|
||||||
- transformers 4.6.1
|
|
||||||
- tree-sitter 0.2.2
|
|
||||||
|
|
||||||
## Download
|
|
||||||
|
|
||||||
* [Pre-trained checkpoints](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/pretrained_models)
|
* [Pre-trained checkpoints](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/pretrained_models)
|
||||||
* [Fine-tuning data](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/data)
|
* [Fine-tuning data](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/data)
|
||||||
@ -144,6 +145,14 @@ gsutil -m cp -r "gs://sfr-codet5-data-research/finetuned_models" .
|
|||||||
|
|
||||||
## Fine-tuning
|
## Fine-tuning
|
||||||
|
|
||||||
|
### Dependency
|
||||||
|
|
||||||
|
- Pytorch 1.7.1
|
||||||
|
- tensorboard 2.4.1
|
||||||
|
- transformers 4.6.1
|
||||||
|
- tree-sitter 0.2.2
|
||||||
|
|
||||||
|
### How to run?
|
||||||
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your cloned CodeT5 repository path.
|
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your cloned CodeT5 repository path.
|
||||||
|
|
||||||
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task`
|
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task`
|
||||||
|
Loading…
Reference in New Issue
Block a user