update table of contents

2024-10-01 06:35:38 -04:00 · 2023-07-12 15:47:00 +08:00 · 2023-07-12 15:47:00 +08:00 · 3b529e206d
commit 3b529e206d
parent e156956bcd
2 changed files with 59 additions and 39 deletions
--- a/CodeT5+/README.md
+++ b/CodeT5+/README.md
@ -8,6 +8,16 @@ Find out more via our [blog post](https://blog.salesforceairesearch.com/codet5-o
 *Authors*: [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution)
 ## Table of Contents
 1. [What is this about?](#what-is-this-about)
 2. [Released Models](#released-models)
 3. [How to Use?](#how-to-use)
 4. [Instruction Tuning to Align with Natural Language Instructions](#instruction-tuning-to-align-with-natural-language-instructions)
 5. [How to Finetune Using Your Own Data?](#how-to-finetune-using-your-own-data)
 6. [Reproduce the Results](#reproduce-the-results)
 7. [Citation](#citation)
 # What is this about?
 CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
@ -106,6 +116,7 @@ Our CodeT5+ models achieves strong results on HumanEval benchmark in zero-shot s
 | code-cushman-001    | 33.5     | 54.3     | 77.4     |
 | StarCoder 15B       | 33.6     | -        | -        |
 | InstructCodeT5+ 16B | **36.1** | **57.1** | **80.7** |
 Please follow the instructions below to reproduce the results.
--- a/CodeT5/README.md
+++ b/CodeT5/README.md
@ -9,6 +9,45 @@ This is the official PyTorch implementation for the following EMNLP 2021 paper f
 ![CodeT5 demo](../codet5.gif)
 ## Table of Contents
 1. [Introduction](#introduction)
 2. [Updates](#updates)
 3. [Download Pretrained and Fine-tuned Checkpoints](#download-pretrained-and-fine-tuned-checkpoints)
 4. [Fine-tuning](#fine-tuning)
   1. [How to run?](#how-to-run)
   2. [How to reproduce the results using the released finetuned checkpoints?](#how-to-reproduce-the-results-using-the-released-finetuned-checkpoints)
   3. [How to fine-tune on your own task and dataset?](#how-to-fine-tune-on-your-own-task-and-dataset)
 5. [Citation](#citation)
 ## Introduction
 This repo provides the code for reproducing the experiments
 in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
 . CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
 functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
 state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
 Paper link: https://arxiv.org/abs/2109.00859
 Blog link: https://blog.salesforceairesearch.com/codet5/
 The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
 and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
 code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
 clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
 of our paper.
 In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
 At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
 CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
 - **Text-to-code generation**: generate code based on the natural language description.
 - **Code autocompletion**: complete the whole function of code given the target function name.
 - **Code summarization**: generate the summary of a function in natural language description.
 ## Updates
 **July 06, 2022**
@ -86,46 +125,8 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
 # this prints "{user.username}"
 ```
 ## Introduction
-This repo provides the code for reproducing the experiments
+## Download Pretrained and Fine-tuned Checkpoints
 in [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf)
 . CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on **8.35M**
 functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves
 state-of-the-art results on **14 sub-tasks** in a code intelligence benchmark - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE).
 Paper link: https://arxiv.org/abs/2109.00859
 Blog link: https://blog.salesforceairesearch.com/codet5/
 The code currently includes two pre-trained checkpoints ([CodeT5-small](https://huggingface.co/Salesforce/codet5-small)
 and [CodeT5-base](https://huggingface.co/Salesforce/codet5-base)) and scripts to fine-tune them on 4 generation tasks (
 code summarization, code generation, translation, and refinement) plus 2 understanding tasks (code defect detection and
 clone detection) in CodeXGLUE. We also provide their fine-tuned checkpoints to facilitate the easy replication
 of our paper.
 In practice, CodeT5 can be deployed as an AI-powered coding assistant to boost the productivity of software developers.
 At Salesforce, we build an [AI coding assistant demo](https://github.com/salesforce/CodeT5/raw/main/codet5.gif) using
 CodeT5 as a VS Code plugin to provide three capabilities for Apex developers:
 - **Text-to-code generation**: generate code based on the natural language description.
 - **Code autocompletion**: complete the whole function of code given the target function name.
 - **Code summarization**: generate the summary of a function in natural language description.
 ## Table of Contents
 1. [Dependency](#dependency)
 2. [Download](#download)
 3. [Fine-tuning](#fine-tuning)
 ## Dependency
 - Pytorch 1.7.1
 - tensorboard 2.4.1
 - transformers 4.6.1
 - tree-sitter 0.2.2
 ## Download
 * [Pre-trained checkpoints](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/pretrained_models)
 * [Fine-tuning data](https://console.cloud.google.com/storage/browser/sfr-codet5-data-research/data)
@ -144,6 +145,14 @@ gsutil -m cp -r "gs://sfr-codet5-data-research/finetuned_models" .
 ## Fine-tuning
 ### Dependency
 - Pytorch 1.7.1
 - tensorboard 2.4.1
 - transformers 4.6.1
 - tree-sitter 0.2.2
 ### How to run?
 Go to `sh` folder, set the `WORKDIR` in `exp_with_args.sh` to be your cloned CodeT5 repository path.
 You can use `run_exp.py` to run a broad set of experiments by simply passing the `model_tag`, `task`, and `sub_task`