Transformer-Assisted LLM-Based Source Code Summarisation: to Enable More Secure Software Development
Transformer-assisted LLM-Based Source Code Summarisation, using CodeSumBART to provide one-shot examples to prompt LLMs to generate source code summaries.
This work was persented at NLPAICS 2026.
We trained a CodeSumBART model, using the Funcom, cleaned using an updated version of JavaDatasetCleaner. We then used this model to generate method-summary predictions for a 10% evaluation split from this dataset. We use this data to prompt a Large Language Model to generate improved summaries.
- CodeSumBART-ForGeneration.py: script to use CodeSumBART to generate outputs in the format needed for our LLM prompting script.
- full_csb.ckpt: a CodeSumBART model trained on the full Funcom dataset. You will need to download or generate this separately, due to the model's size.
- getResults.py: a script which turns all of the LLM's output TSV files to an Excel spreadsheet.
- run.py: the script which uses an LLM to summarise source code.
- runWithShortResponses.py: as above, with with the model prompted to only generate human-length responses. getAverageLengthsOfHumanSummaries.py: a script to find the average length of sumaries writen by humans in our dataset.
- getBleu.py: a script to get the BLEU-4 score for summaries generated by our model.
- run.sh: a template script to call the run.py script.
- requirements.txt: the python package requirements for running the model.
This code is designed to run on UCREL's Hex HPC, using Slurm. Recommended System Requirements:
- Nvidia A5000, 24GB
- 128GB RAM
- 128GB Storage (SSD or HDD)
- 2GHz minimum multi-core CPU
@misc{UcrelHex,
title = {{UCREL - Hex}; A shared, hybrid multiprocessor system},
author = {Vidler, John AND Rayson, Paul},
abstract = {Hex is a collection of GPU equipped hosts onto which single- multi-
or GPU-processor jobs can be executed hosted at Lancaster University,
UK as part of the School of Computing and Communications and the
UCREL group.},
howpublished = {\url{https://github.com/UCREL/hex}},
note = {Accessed: 2026}
}