update table of contents

This commit is contained in:
WANG Yue 2023-07-12 15:47:00 +08:00
parent e156956bcd
commit 3b529e206d
2 changed files with 59 additions and 39 deletions

View File

@ -8,6 +8,16 @@ Find out more via our [blog post](https://blog.salesforceairesearch.com/codet5-o
*Authors*: [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution)
## Table of Contents
1. [What is this about?](#what-is-this-about)
2. [Released Models](#released-models)
3. [How to Use?](#how-to-use)
4. [Instruction Tuning to Align with Natural Language Instructions](#instruction-tuning-to-align-with-natural-language-instructions)
5. [How to Finetune Using Your Own Data?](#how-to-finetune-using-your-own-data)
6. [Reproduce the Results](#reproduce-the-results)
7. [Citation](#citation)
# What is this about?
CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
@ -106,6 +116,7 @@ Our CodeT5+ models achieves strong results on HumanEval benchmark in zero-shot s
| code-cushman-001 | 33.5 | 54.3 | 77.4 |
| StarCoder 15B | 33.6 | - | - |
| InstructCodeT5+ 16B | **36.1** | **57.1** | **80.7** |
Please follow the instructions below to reproduce the results.

View File

@ -9,6 +9,45 @@ This is the official PyTorch implementation for the following EMNLP 2021 paper f
![CodeT5 demo](../codet5.gif)
## Table of Contents
1. [Introduction](#introduction)
2. [Updates](#updates)
3. [Download Pretrained and Fine-tuned Checkpoints](#download-pretrained-and-fine-tuned-checkpoints)
4. [Fine-tuning](#fine-tuning)
1. [How to run?](#how-to-run)
2. [How to reproduce the results using the released finetuned checkpoints?](#how-to-reproduce-the-results-using-the-released-finetuned-checkpoints)
3. [How to fine-tune on your own task and dataset?](#how-to-fine-tune-on-your-own-task-and-dataset)
5. [Citation](#citation)
## Introduction
This repo provides the code for reproducing the experiments
in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
Paper link: https://arxiv.org/abs/2109.00859
Blog link: https://blog.salesforceairesearch.com/codet5/
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
of our paper.
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
- **Text-to-code generation**: generate code based on the natural language description.
- **Code autocompletion**: complete the whole function of code given the target function name.
- **Code summarization**: generate the summary of a function in natural language description.
## Updates
**July 06, 2022**
@ -86,46 +125,8 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
# this prints "{user.username}"
```
## Introduction
This repo provides the code for reproducing the experiments
in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
Paper link: https://arxiv.org/abs/2109.00859
Blog link: https://blog.salesforceairesearch.com/codet5/
The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
of our paper.
In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
- **Text-to-code generation**: generate code based on the natural language description.
- **Code autocompletion**: complete the whole function of code given the target function name.
- **Code summarization**: generate the summary of a function in natural language description.
## Table of Contents
1. [Dependency](#dependency)
2. [Download](#download)
3. [Fine-tuning](#fine-tuning)
## Dependency
- Pytorch 1.7.1
- tensorboard 2.4.1
- transformers 4.6.1
- tree-sitter 0.2.2
## Download
## Download Pretrained and Fine-tuned Checkpoints
* [Pre-trained checkpoints](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/pretrained_models)
* [Fine-tuning data](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/data)
@ -144,6 +145,14 @@ gsutil -m cp -r "gs://sfr-codet5-data-research/finetuned_models" .
## Fine-tuning
### Dependency
- Pytorch 1.7.1
- tensorboard 2.4.1
- transformers 4.6.1
- tree-sitter 0.2.2
### How to run?
Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your cloned CodeT5 repository path.
You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task`