How We Found a Bug in the Go Compiler
Cloudflare engineers discovered a rare race condition bug in Go's ARM64 compiler where asynchronous preemption during split stack pointer adjustments caused stack unwinding crashes, fixed in Go 1.23.12 and later.
Cloudflare handles 84 million HTTP requests per second across 330 cities. This scale allowed our team to discover a rare bug in the Go compiler for the arm64 architecture, causing a race condition in the generated code.
Investigating a Strange Panic
A core configuration service for processing Magic Transit and Magic WAN traffic began exhibiting random panics on arm64 machines. The initial error pointed to a stack unwinding problem.
The team noticed a correlation between critical panics and recovered panics in code that used the panic/recover pattern for error handling. After removing this pattern, the critical panics disappeared, but later returned in greater numbers — up to thirty per day across different machines.
Two classes of errors were identified:
- Crashes when accessing invalid memory (segmentation fault)
- An explicit critical error: "traceback did not unwind completely"
Both errors occurred during stack unwinding in the (*unwinder).next function.
Overview of Go Scheduler Structures
Go uses a lightweight M:N user-space scheduler to manage concurrency. Three main types:
g(goroutine)m(kernel thread or "machine")p(physical execution context or "processor")
Every running goroutine contains a field for its m, otherwise the value is nil.
Asynchronous Preemption
Until Go 1.13, scheduling was cooperative. Starting with Go 1.14, asynchronous preemptive multitasking was implemented. The sysmon thread monitors goroutines running for more than 10ms and preempts them by sending SIGURG, modifying the program counter and stack to simulate a call to asyncPreempt.
The Key Discovery
Examining a production core dump revealed a critical detail: a goroutine had been suspended between two opcodes in the epilogue of the (*NetlinkSocket).Receive function:
ADD $80, RSP, RSP
ADD $(16<<12), RSP, RSPPreemption between these instructions left the stack pointer in an intermediate state, causing the stack unwinder to crash when attempting to determine the parent frame.
Reproducing the Problem
A minimal example to reproduce the issue:
package main
import "runtime"
//go:noinline
func big_stack(val int) int {
var big_buffer = make([]byte, 1 << 16)
sum := 0
for i := 0; i < (1<<16); i++ {
big_buffer[i] = byte(val)
}
for i := 0; i < (1<<16); i++ {
sum ^= int(big_buffer[i])
}
return sum
}
func main() {
go func() {
for {
runtime.GC()
}
}()
for {
_ = big_stack(1000)
}
}A stack size exceeding 16 bits forces the compiler to split the stack pointer adjustment into two arm64 opcodes. When preemption occurs between them, the stack pointer is left in an invalid state.
The Technical Nature of the Problem
On arm64 (an architecture with fixed 4-byte instruction length), immediate operands are limited:
ADD— 12-bit widthMOV— 16-bit width
When a larger value is needed, the instruction is split into multiple opcodes. For stacks larger than 1<<15, the compiler generated:
ADD $8, RSP, R29
ADD $(16<<12), R29, R29
ADD $16, RSP, RSP
ADD $(16<<12), RSP, RSPThe stack unwinder requires the stack pointer to always be valid, as it dereferences sp to determine the calling function. If sp is partially modified, the unwinder looks for the calling function in the middle of the stack, leading to a crash.
The Fix
The bug was fixed in Go 1.23.12, 1.24.6, and 1.25.0. Instead of splitting the instruction into multiple opcodes, the compiler now loads the offset into a temporary register and performs a single atomic operation:
MOVD $32, R27
MOVK $(1<<16), R27
ADD R27, RSP, RSPThis guarantees that preemption can occur before or after the stack pointer modification, but not during the process.



FAQ
What is this article about in one sentence?
This article explains the core idea in practical terms and focuses on what you can apply in real work.
Who is this article for?
It is written for engineers, technical leaders, and curious readers who want a clear, implementation-focused explanation.
What should I read next?
Use the related articles below to continue with closely connected topics and concrete examples.