读paper22-arxiv代码修复论文组

CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching

https://arxiv.org/abs/2503.22424

https://github.com/ZhonghaoJiang/CoSIL

相当于一个分级搜索,首先针对issue描述,基于Module Call Graph搜索搜索相关代码文件,然后针对相关代码文件,构建Function Call Graph,搜索相关函数。

相对于抽象语法树,函数调用图确实可以更精确的反应函数间的上下文关系,在缺陷定位与上下文收集上会更加精确。而且这种逐级展开的方式,也可以一定程度上避免LLM的上下文窗口限制问题(尤其是研究中两个Call Graph都是LLM构建的)。

感觉也可以加一个回溯机制,如果无法在函数调用图中找到目标,回溯到模块调用图的剩余子图中继续搜索。

对于d4j类的数据集,则可以更简单一些,直接正则提取测试用例中的函数,在函数调用图中搜索,尤其是对于被测函数不会直接触发报错而不会体现在异常栈中的测试。

The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models

https://arxiv.org/abs/2505.02931

研究通过平衡补丁迭代次数与生成的补丁数量,探究最佳的生成策略。文中探究了如下几种策略:

  • Strategy A (10×1): Generate ten outputs in a single iteration.
  • Strategy B (8-2): Generate eight outputs in the first iteration, and two outputs in the next iteration.
  • Strategy C (5×2): Generate five outputs per iteration over two iterations.
  • Strategy D (6-2-2): Generate six outputs in the first iteration, and two outputs in the next two iterations.
  • Strategy E (4-3-3): Generate four outputs in the first iteration, and three outputs in the next two iterations.
  • Strategy F (2×5): Generate two outputs per iteration over five iterations.
  • Strategy G (1×10): Generate one output per iteration over ten iterations.

结果表明基础模型极大地受益于迭代式生成补丁,而非一次性生成所有补丁。而且,迭代策略的优势在复杂的基准测试中更为显著。

个人认为迭代式生成在一定程度上可以看作延长LLM的思考时间,以及变相的引入反思机制,从而提高最终的准确率。

SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis

https://arxiv.org/abs/2505.20630

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

https://arxiv.org/abs/2505.17703