Git: programmatic staging
In the past year, I’ve been using a lot of tools to automatically rewrite/refactor code. These include semgrep, ast-grep, LLMs, and one-off scripts. After running these tools on a large code-base, you usually end up with lots of additional unintended changes. These range from formatting/whitespace to unrequested modifications by LLMs.
The subsequent “cleanup” step is a very manual and tedious process. I’m essentially running git add -p
and staging hunks one at a time. At times it feels like this step offsets the productivity gain from the rewrite tool itself.
After doing this several times, I realized that most of the hunks I was staging included some common text. If I could automatically stage hunks containing a search term, I could automate a lot of this work! Git does not natively support this, but it can be easily accomplished using the expect tool:
#!/usr/bin/expect -f
# Set timeout to prevent the script from hanging
set timeout -1
# Get the search pattern as a command line argument
if {[llength $argv] != 0} {
set pattern [lindex $argv 0]
} else {
puts "Error: search pattern not provided"
exit 1
}
# Open the interaction with git add -p
spawn git add -p
# This is the main loop that handles the user interaction
expect {
# This expect block is for the hunk that contains the provided pattern
"*$pattern*Stage this hunk*" {
send "y\r"
exp_continue
}
# This expect block is for continuing to the next hunk
"*Stage this hunk*" {
send "n\r"
exp_continue
}
eof
}
To install this script, save it in your PATH
with the name git-add-match
.
Once installed, the usage is as follows:
$ git add-match foo
After running this command, all hunks where the string “foo” is found will be staged.
Edit:
A user on lobste.rs suggested using grepdiff instead:
git diff | grepdiff --output-matching=hunk PATTERN | git apply --cached