Welcome to the source code repo of Repilot, a patch generation tool introduced in our ESEC/FSE'23 paper "Copiloting the Copilot: Fusing Large Language Models with Completion Engines for Automated Program Repair"!
Repilot leverages the synergy between a semantics-based code completion engine and an auto-regressive large language model for more efficient valid patch generation.
Important
Repilot is implemented for Java patch generation as a complex hybrid system combining a Modified Eclipse JDT Language Server and Python's huggingface/transformers interface for manipulating large language models. Correctly setting up the dependencies and configurations of Repilot can be non-trivial. Therefore, we highly recommend directly using our out-of-the-box Docker image.
# Pull the image and run a container.
# This may take some time...
docker run -it --name repilot universefly/repilot:latest
# Now you will get into a "virtual environment" provided by Docker
# Enter the `Repilot` directory
cd /root/Repilot
# This is important because Repilot relies on a `meta_config.json` file to work properly
cat meta_config.json
# Generate patches with the full Repilot approach using CodeT5
ACTIVE=1 python -m repilot.cli.main repair -b "Chart-9" --method pruned-mem -d chart-9-repilot -n 5
# You will see logs about the patch generation and which tokens are accepted/rejected.
# Validate the patch generation
python -m repilot.cli.main validate -d chart-9-repilot
# Print a table of the evaluation results
python -m repilot.cli.main evaluate -d chart-9-repilot
# You'll see something like this:
# Repilot Evaluation Results
# βββββββββββββββββββ³βββββββββββββββββββ³ββββββββββββββββββββββ³βββββββββββββββββββββ³βββββββββββββββββββ³βββββββββββββββββ
# β Tag β Average Gen Time β %Compilable Patches β %Plausible Patches β #Plausible Fixes β #Correct Fixes β
# β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
# β chart-9-repilot β 1.33s β 100.0% β 0.000% β 0 β - β
# βββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββ
For more comprehensive guidance on how to use Repilot and how to reproduce the results in our paper, we greatly encourage you to check out our artifact documentation.
Warning
Building Repilot from source is NOT recommended since there are many complex dependencies and configurations to handle. It is only for advanced users who want to extend Repilot. If you want to build from source, we also encourage you to check out our Dockerfile for more details.
Important
Environment requirements
- Python 3.10 and Git LFS are required.
- All three versions of Java 8, 11, and 18 are required. For convenient management of multiple Java versions, we recommend coursier.
- (Optional) It's recommended to have an NVIDIA GPU with >6G memory for running Repilot with CodeT5 and >30G memory for Incoder-6.7B.
Download and build the modified Eclipse JDT Language Server
Follow the instructions in the repo to build the modified Eclipse JDT Language Server. Note you will need Java 11:
git clone https://github.com/UniverseFly/eclipse.jdt.ls
cd eclipse.jdt.ls
JAVA_HOME=/path/to/java/11 ./mvnw clean verify -DskipTests=true
Adjust the following command according to your build to dry run the language server:
java \
-Declipse.application=org.eclipse.jdt.ls.core.id1 \
-Dosgi.bundles.defaultStartLevel=4 \
-Declipse.product=org.eclipse.jdt.ls.core.product \
-Dlog.level=ALL \
-noverify \
-Xmx1G \
--add-modules=ALL-SYSTEM \
--add-opens java.base/java.util=ALL-UNNAMED \
--add-opens java.base/java.lang=ALL-UNNAMED \
-jar ./plugins/org.eclipse.equinox.launcher_1.5.200.v20180922-1751.jar \
-configuration ./config_linux \
-data /path/to/data
If everything goes well, you can move on to the next step.
Download and install Repilot as a Python package including its dependencies
git clone https://github.com/UniverseFly/Repilot && cd Repilot
# Do an editable install
pip install -e .
# Consider upgrading pip if you encounter any errors, also make sure you are using Python 3.10
# This command should also install all the dependencies of Repilot
Install the Defects4j datasets
Repilot evaluates on the Defects4j dataset. Please checkout to its v2.0.0 release and follow its instructions to install the dataset.
[!WARNING] If you directly download the release instead of doing a checkout you may encounter errors when running Repilot, as Repilot will dump the metadata by collecting the meta information of these projects as Git repos. If they are not Git repos, Repilot may fail.
You can check the installation by running /path/to/defects4j info -p Chart
.
Prepare the runtime environment of Repilot
We need to prepare a meta_config.json
file for Repilot to work properly. The file should be placed in the root directory of Repilot. Please modify the following template according to your environment and save the file in the root directory of Repilot:
{
"d4j_home": "/home/yuxiang/Developer/defects4j",
"d4j_checkout_root": "/home/yuxiang/Developer/d4j-checkout",
"jdt_ls_repo": "/home/yuxiang/Developer/eclipse.jdt.ls",
"java8_home": "/home/yuxiang/.cache/coursier/arc/https/github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u181-b13/OpenJDK8U-jdk_x64_linux_hotspot_8u181b13.tar.gz/jdk8u181-b13",
"language_server_cmd": [
"/home/yuxiang/.cache/coursier/arc/https/github.com/adoptium/temurin18-binaries/releases/download/jdk-18.0.2%252B9/OpenJDK18U-jdk_x64_linux_hotspot_18.0.2_9.tar.gz/jdk-18.0.2+9/bin/java",
"-Declipse.application=org.eclipse.jdt.ls.core.id1",
"-Dosgi.bundles.defaultStartLevel=4",
"-Declipse.product=org.eclipse.jdt.ls.core.product",
"-Dlog.level=ERROR",
"-noverify",
"-Xmx1G",
"--add-modules=ALL-SYSTEM",
"--add-opens",
"java.base/java.util=ALL-UNNAMED",
"--add-opens",
"java.base/java.lang=ALL-UNNAMED",
"-jar",
"/home/yuxiang/Developer/eclipse.jdt.ls/org.eclipse.jdt.ls.product/_target/repository/plugins/org.eclipse.equinox.launcher_1.6.400.v20210924-0641.jar",
"-configuration",
"/home/yuxiang/Developer/eclipse.jdt.ls/org.eclipse.jdt.ls.product/_target/repository/config_linux"
],
"seed": 0
}
Now let's cd
back to the root directory of Repilot, and run the following command to checkout all the Defects4J bugs:
python -m repilot.cli.init
Do an example run
# Generate patches with the full Repilot approach using CodeT5
ACTIVE=1 python -m repilot.cli.main repair -b "Chart-9" --method pruned-mem -d chart-9-repilot -n 5 # You will see logs about the patch generation and which tokens are accepted/rejected.
# Validate the patch generation
python -m repilot.cli.main validate -d chart-9-repilot
# Print a table of the evaluation results
python -m repilot.cli.main evaluate -d chart-9-repilot
You will see a table of evaluation results if everything goes well.
(Optional) Unpack the pre-generated patches
The GitHub repo also contains pre-generated patches for the experiments in our paper. You can unpack if you would like to check them. First, make sure you cd
to the root directory of Repilot. Then run the following command:
tar -xvf ./data/large.tar.xz
Then you will see the data/large
directory is populated with the pre-generated patches.
π₯π₯Congratulations! You have successfully built and used Repilot from source!π₯π₯