How I Punished Firaxis
A programmer investigates why Civilization IV consumed 900MB of RAM and creates a memory-saving patch that was downloaded 150,000 times, fixing what Firaxis couldn't fix for months despite having source code access.
The Problem
Before the release of Civilization IV, I was a big fan of the third installment and eagerly awaited the new game. When Civ4 finally came out around 2005, I was puzzled by one thing: the game's graphics were roughly on par with games from 2002-2003, yet it consumed up to 900MB of RAM during gameplay. This caused severe swapping, which was particularly painful on notebooks and when playing large maps. This seemed completely inexplicable, given that Far Cry — a game with vastly superior graphics — ran perfectly fine on systems with just 512MB of RAM.
This situation bugged me so much that I decided to investigate what was really going on under the hood.
The Investigation
My first suspect was Python. Firaxis had made a big deal about using Python for game scripting, so it seemed like a natural place to start. I instrumented Python's DLL with memory allocation logging hooks to track every allocation it made. The result? Python was using only about 25MB. Not the culprit.
Next, I looked into the standard C runtime memory allocations. Using detours hooks on malloc, realloc, and free, I monitored every C-level allocation. Again, no meaningful leaks emerged — nothing that could explain the enormous memory footprint.
A suspicion crept in that there might be memory leaks in the standard C allocation routines. But that trail also went cold.
So I went deeper and hooked VirtualAlloc calls to identify which modules were requesting large amounts of memory from the operating system. And that's when I found something surprising: d3d9.dll — the DirectX 9 graphics library — was consuming the lion's share of system memory. This was baffling, because graphics resources should primarily reside in video memory, not system RAM.
The Discovery
After analyzing the DirectX resource usage patterns, I found the root cause. Civilization IV was creating all its graphics resources using D3DPOOL_MANAGED instead of D3DPOOL_DEFAULT.
What's the difference? The MANAGED memory pool instructs DirectX to maintain a backup copy of every video resource in system RAM. This is a convenience feature for developers — if the video device is lost (say, when the user Alt-Tabs), the driver can automatically restore all resources from these backup copies. However, this effectively doubles memory usage for all graphics resources, consuming around 500MB of system RAM in Civilization IV's case.
While using D3DPOOL_MANAGED simplifies programming, it's tremendously wasteful. The typical ratio of system RAM to video RAM is about 4:1, so caching video resources in system RAM is akin to caching disk contents in RAM at a 100:1 ratio — technically possible, but rarely a good idea.
I found approximately 400MB worth of vertex buffers, with a single buffer weighing in at 280MB. To figure out what this massive buffer represented, I employed a creative technique: I deliberately corrupted the vertex data during the buffer's Unlock operation and watched what broke in the game.
The answer was terrain tiles. Unlike the pre-rendered unit animations, each map tile's geometry depends on the terrain features of its neighboring tiles — mountains, rivers, surface types, and so on. This makes identical tile configurations extremely rare. Civilization IV was creating unique vertex buffers for nearly every visible tile on the map, and thanks to D3DPOOL_MANAGED, each one was duplicated in system memory.
The Solution
I implemented a deduplication system based on hashing. The idea was simple: when a vertex buffer was unlocked (i.e., the game had finished writing to it), I would compute a hash of its contents and check if an identical buffer already existed in a cache. If so, subsequent DrawIndexedPrimitive calls would reuse the cached buffer instead of the duplicate.
For the hash function, I chose a base-5 representation of the vertex buffer data. The choice of 5 was deliberate: it's a prime number with good mixing properties, and multiplication by 5 can be done extremely efficiently on x86 processors using the LEA (Load Effective Address) instruction, avoiding the slow MUL operation entirely.
I spent about half a day writing this contraption — a mix of C++ templates, inline assembly, and COM call interception via detours hooks. It wasn't pretty, but it worked.
The Results
Memory consumption dropped from approximately 800MB to 300-400MB. In 2010-era terms, this was equivalent to reducing system requirements from 4GB down to about 1.5GB.
I released the patch on CivFanatics.com, the main Civilization community site. Within just a few days, it had been downloaded over 150,000 times. The community response was overwhelmingly positive — players who had struggled with swapping issues on large maps could suddenly play smoothly.
The most remarkable part? Firaxis, with full access to the game's source code, took several months to implement an equivalent fix in an official patch. Meanwhile, I had achieved the same result in half a day by working from the outside, with nothing but reverse engineering tools and a deep understanding of DirectX memory management.
Sometimes the outsider's perspective and low-level systems knowledge can outperform having the source code. The lesson: understanding how your platform works at a fundamental level is at least as important as understanding your own code.