Semgrep: AutoFixes using LLMs


Semgrep is an incredible tool that allows you to search code by matching against the Abstract Syntax Tree (AST). For instance, if you want to find all method calls named get_foo, you can write a pattern like this:


Test your own patterns using the playground:

While there are other tools like this, semgrep is currently the most capable:


Semgrep not only searches using patterns but also supports rewriting the matches. Here’s a simple rule definition from their documentation:

- id: use-sys-exit
  - python
  message: |
    Use `sys.exit` over the python shell `exit` built-in. `exit` is a helper
    for the interactive shell and is not be available on all Python implementations.    
  pattern: exit($X)
  fix: sys.exit($X)
  severity: WARNING

This can be invoked by running:

semgrep --config ./rule.yml --autofix


Although the built-in autofix feature is powerful, it’s limited to simple AST transforms. I’m currently exploring the idea of fixing semgrep matches using a Large Language Model (LLM). More specifically, each match is individually fed into the LLM and replaced with the response. To make this possible, I’ve created a tool called semgrepx, which can be thought of as xargs for semgrep. I then use semgrepx to rewrite the matches using the fantastic llm tool. Here’s how it works:

semgrep -l go --pattern 'log.$A(...)' --json > matches.json
semgrepx llm 'update this go to use log.Printf' < matches.json