Debugging a flaky Go test with Mozilla rr

This is how you debug a test that only fails once every 1000 times.

The Test

package my

import (
	"math/rand"
	"testing"
	"time"
)

func init() {
	rand.Seed(time.Now().UnixNano())
}

func TestRandFail(t *testing.T) {
	if n := rand.Intn(1000); n == 50 {
		t.Fatalf("finally got %d", n)
	}
}

Get the newest version of rr

git clone https://github.com/rr-debugger/rr.git
cd rr
git checkout 5.4.0 # change this to the latest release (DO NOT BUILD HEAD)
mkdir build
cd build
cmake ..
make -j8
sudo make install

Compile your failing test to a binary

go test -gcflags 'all=-N -l' -c

Install the following rrloop script.

#!/bin/sh

while :
do
  rr $@
  if [ $? -ne 0 ]; then
    echo "encountered non-zero exit code: $?";
    exit;
  fi
done

Record the test execution in a loop until it fails

echo -1 | sudo tee -a /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee -a /proc/sys/kernel/kptr_restrict

rrloop record ./my.test

Example Output:

...
rr: Saving execution to trace directory `/home/icholy/.local/share/rr/my.test-118'.
PASS
rr: Saving execution to trace directory `/home/icholy/.local/share/rr/my.test-119'.
PASS
rr: Saving execution to trace directory `/home/icholy/.local/share/rr/my.test-120'.
PASS
rr: Saving execution to trace directory `/home/icholy/.local/share/rr/my.test-121'.
PASS
rr: Saving execution to trace directory `/home/icholy/.local/share/rr/my.test-122'.
--- FAIL: TestRandFail (0.00s)
    my_test.go:15: finally got 50
FAIL
encountered non-zero exit code: 0

Debug the execution trace

go install github.com/go-delve/delve/cmd/dlv@latest
dlv replay /home/icholy/.local/share/rr/my.test-122

Find the test function and set a breakpoint

(dlv) funcs TestRand
my.TestRandFail
(dlv) b my.TestRandFail
Breakpoint 1 set at 0x50b173 for my.TestRandFail() ./my_test.go:13
(dlv) c
> my.TestRandFail() ./my_test.go:13 (hits goroutine(6):1 total:1) (PC: 0x50b173)
Current event: 414
     8:	
     9:	func init() {
    10:		rand.Seed(time.Now().UnixNano())
    11:	}
    12:	
=>  13:	func TestRandFail(t *testing.T) {
    14:		if n := rand.Intn(100); n == 50 {
    15:			t.Fatalf("finally got %d", n)
    16:		}
    17:	}

Connect Visual Studio Code

Start delve in server mode

dlv replay /home/icholy/.local/share/rr/my.test-122 --headless --listen=:2345 --log --api-version=2

Remotely connect using the following launch.json configuration:

{
    "name": "Replay Trace",
    "type": "go",
    "request": "attach",
    "mode": "remote",
    "remotePath": "${workspaceFolder}",
    "port": 2345,
    "host": "127.0.0.1"
}