Beware the Letter 'M': The Strangest Bug in My Life
HTTP endpoints containing the letter M returned 400 errors. The cause? A bit-shift overflow bug in Netty's HTTP codec.
I want to tell you about the strangest bug I've ever encountered in my career. It's a story about how a single letter in an HTTP endpoint name could bring your entire server to its knees.
The Mystical Letter 'M'
It all started when a colleague came to me with a bizarre observation: "If the endpoint name contains the letter M — it doesn't work." I didn't believe him at first. How could a single letter in a URL path affect whether an HTTP request succeeds or fails?
But he was right. Here's a minimal reproduction:
fun main() {
embeddedServer(Netty, port = 8080) {
routing {
get("/hello") { call.respondText("ok", ContentType.Text.Plain, HttpStatusCode.OK) }
get("/helloM") { call.respondText("ok", ContentType.Text.Plain, HttpStatusCode.OK) }
}
}.start(wait = true)
}Testing with cURL:
curl -i localhost:8080/hello # 200 OK
curl -i localhost:8080/helloM # 400 Bad Request!
curl -i localhost:8080/hello%4D # 200 OK (URL-encoded 'M')The endpoint /hello works fine. But /helloM returns HTTP 400 — Bad Request. And if you URL-encode the letter M as %4D, it works again! Something was clearly wrong with how the server parsed raw URI characters.
My Methodical Microanalysis
I used git-bisect to track down the exact commit that introduced this behavior. It turned out the problem appeared in Netty codec-http version 4.1.129.Final.
The exception was being thrown from a new validation function called validateRequestLineTokens:
static void validateRequestLineTokens(HttpVersion httpVersion, HttpMethod method, String uri) {
if (method.getClass() != HttpMethod.class) {
if (!isEncodingSafeStartLineToken(method.asciiName())) {
throw new IllegalArgumentException(
"The HTTP method name contain illegal characters: " + method.asciiName());
}
}
if (!isEncodingSafeStartLineToken(uri)) {
throw new IllegalArgumentException("The URI contain illegal characters: " + uri);
}
}The Mystical Letter... 'J'!
Before diving into the code, I decided to test all letters systematically:
for c in {A..Z}; do
_path="/hello${c}"
code=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:8080$_path")
echo "$code $_path"
doneThe results were illuminating: every letter returned the expected 404 (endpoint not found), except for 'J' and 'M', which both returned 400. So it wasn't just 'M' — the letter 'J' was also affected!
Maniacally Digging Into the Problem
The heart of the bug was in the isEncodingSafeStartLineToken method. The developers had tried to use a clever bitwise technique to check for illegal characters in the URI:
private static final long ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK = 1L << '\n' | 1L << '\r' | 1L << ' ';
public static boolean isEncodingSafeStartLineToken(CharSequence token) {
int i = 0;
int lenBytes = token.length();
int modulo = lenBytes % 4;
int lenInts = modulo == 0 ? lenBytes : lenBytes - modulo;
for (; i < lenInts; i += 4) {
long chars = 1L << token.charAt(i) |
1L << token.charAt(i + 1) |
1L << token.charAt(i + 2) |
1L << token.charAt(i + 3);
if ((chars & ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK) != 0) {
return false;
}
}
for (; i < lenBytes; i++) {
long ch = 1L << token.charAt(i);
if ((ch & ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK) != 0) {
return false;
}
}
return true;
}The idea was to create a bitmask where bits 10 (newline), 13 (carriage return), and 32 (space) are set. Then, for each character in the URI, shift 1L left by the character's ASCII value and check if it overlaps with the mask. If it does, the character is "illegal."
The mask value:
ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK = 1L << 10 | 1L << 13 | 1L << 32
= 0x0000000100002400Maybe 6 Bits Are Too Few for a Long?
Here's where the Java Language Specification comes into play. According to JLS §15.19:
"If the promoted type of the left-hand operand is
long, then only the six lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator&with the mask value0x3f(0b111111)."
This means that for a long value, the shift distance is always taken modulo 64. Only the bottom 6 bits of the shift amount matter. Let's trace through what happens:
For the letter 'M' (ASCII 77):
77 & 63 = 13(binary: 77 = 1001101, masked with 111111 = 001101 = 13)1L << 77actually executes as1L << 131L << 13 = 81928192 & MASK = 8192— non-zero! The function returnsfalse- But bit 13 in the mask corresponds to
'\r'(carriage return)!
For the letter 'J' (ASCII 74):
74 & 63 = 101L << 74actually executes as1L << 101L << 10 = 10241024 & MASK = 1024— non-zero!- Bit 10 in the mask corresponds to
'\n'(newline)!
For a normal letter like 'A' (ASCII 65):
65 & 63 = 11L << 65executes as1L << 1 = 22 & MASK = 0— zero, so it passes correctly
So the letters 'M' and 'J' were being falsely identified as carriage return and newline respectively, purely due to bit-shift overflow!

The Moment of Truth — The Fix
In version 4.1.130.Final, the Netty team replaced the entire bitwise approach with a straightforward character comparison:
public static boolean isEncodingSafeStartLineToken(CharSequence token) {
int lenBytes = token.length();
for (int i = 0; i < lenBytes; i++) {
char ch = token.charAt(i);
if (ch <= ' ') {
switch (ch) {
case '\n':
case '\r':
case ' ':
return false;
}
}
}
return true;
}Simple, clear, and most importantly — correct. No bit-shift tricks, no modulo surprises.
Summary
This bug is a perfect example of how low-level language specifications can introduce subtle, devastating bugs. The developers who wrote the original code clearly knew about bitwise operations, but they missed a crucial detail in the Java spec about shift distance masking.
Key takeaways:
git-bisectis an incredibly powerful debugging tool for tracking down regressions- Always be cautious with bitwise shift operations — check the language specification for how shift distances are handled
- Sometimes the "clever" solution is worse than the straightforward one
- Any character with an ASCII code that, when taken modulo 64, equals 10, 13, or 32, would trigger this bug. That means 'M' (77 mod 64 = 13), 'J' (74 mod 64 = 10), and any character with code 96 (96 mod 64 = 32) would all be incorrectly rejected
The project demonstrating this bug is available on GitHub.