Welcome to the Wolfire Blog! This is where we keep everyone up to date on our progress on Overgrowth and other new stuff. Be sure to subscribe to get the latest news! Also, be sure to check out the forums for even more up to date news - or get on IRC for up to the second updates.

Reverse-Engineering Binary Files

Add Comment! By John Graham on April 22nd, 2010

This is a guest post written by Wolf Mathwig aka rudel_ic. Wolf is a long-time forum member who's reverse-engineered Lugaru file formats to enable the modification of 3D models, animations and skeletons for the game. In real life, Wolf is a 30-years-old software developer from Germany.

Reverse-engineering binary file formats is like learning another language without a teacher by analyzing conversations of natural speakers. Imagine observing the following conversation between the two exchange students Horst and Petra:

Horst: Wie spät ist es?

Petra: (looking at her watch) Es ist halb zwölf.

You can make some obvious observations right off the bat. Horst's inflection indicates that he's asking a question. Petra is answering that question. She's probably telling him the time, going by her actions and her concise reply. So it's likely that Horst asked for just that.

On a more rudimentary level, when you read that conversation, you immediately understand which letters form words and that spaces divide them, you observe some peculiar letters and you can make out what a sentence is and who speaks it. Furthermore, you might notice the fact that the words "es" and "ist" occur in both sentences, in both cases next to each other. By drawing parallels to English, you might reach the conclusion that "ist es" means the same as "is it" and vice versa.

Sure, you might get some details wrong, like mistaking "Wie spät" with "What time" since you're just assuming that asking for the time works in German just as it does in English ("Wie spät" literally translates to "How late" instead), but that's not a problem since the meaning stays the same. Having analyzed this and having made these observations, if you encounter a German bloke in the future, you could ask him for the time just as Horst asked Petra; you might not know what time on the watch the reply indicates, but if you just had enough German exchange students and enough time, you'd probably figure that out eventually just as well, and learn numbers, grammatical structures, build a dictionary and a pronounciation guide; and so forth.

File formats for video games are languages for one-sided conversations. You have words, grammatical structures and sentences, and the game takes a file in such a language and generates stuff as that file instructs. You can't just observe a conversation in that language though. Applying the example above to this, it would be like asking Petra what time it is and having her show you the watch. The game doesn't answer in the same language. So all you can do is figure out letters, words, grammar and sentences to then derive the language. Throw self-made files in that language at the game then to see what happens. If you've reached the right conclusions and everything works out, the game generates content as you've indicated; if you've got it wrong though, stuff looks weird, ends up being invisible, too small or the game even crashes.

Even worse, there might be details you can't figure out at all. Often enough, there's special cases in the code that aren't observable in the files the game ships with. After all, the game has been written over a long period of time by smart dudes, and they probably wanted tons of features in it, but scratched some, yet they're still handled in the game's code, but don't show up in the stock files.

Ergo, there's just no way your understanding of that language is complete, and that is a fact you'll just have to live with. However, even a rudimentary understanding of it is immensely empowering. So it certainly is worth the trouble.

When I started with this reverse-engineering project, I started at a spot I knew very well: 3D models. I've modeled professionally in a research context before, I wrote a 3D engine once as well; I've had all the knowledge required to pull it off.

3D models are basically always the same. You've got floating-point numbers, and three of them form a 3D coordinate. 3 coordinates form a triangle. The complete mesh is an assortment of triangles.

To make it easy to deal with when parsing such a mesh in reality, there also usually is some indexing mechanism. So you might index coordinates, for example. Indexing coordinates lets you describe a triangle by calling out its coordinate indices, and that makes sense because neighbouring triangles share coordinates. So you just save space if you apply that, and you can just first declare all coordinates, then declare triangle after triangle by indices.

Textured 3D models also need some sort of texture coordinates for each triangle. These are 2D coordinates, one per triangle vertex. They describe the area of the 2D texture that is to be painted onto that triangle.

Knowing all that, you fire up your favorite HEX editor. A HEX editor displays a file in byteform, meaning that each byte of the file is shown in a hexadecimal format.

A HEX editing plugin for jEdit.
A HEX editing plugin for jEdit.

HexEditor, which is much more useful.
HexEditor, which is much more useful.

The first thing you want to figure out is which "letters" this language has. Well, this is pretty easy - there's really just numbers here, and they're either integers, shorts or floats. In other formats, there might be more types, like longs, doubles, strings and so forth, but not here. How do I know that? Well, I can rule out strings because right of the hexadecimal display, there's the char display, and as long as there's no UTF-16 strings in this one (unlikely), I'd be able to literally read stuff there if there were strings to be found.

I can rule out doubles because doubles are required for high-precision calculations, and a game's 3D model format ordinarily doesn't require high-precision 3D coordinates.

I can rule out longs because longs are required if the integer type doesn't hold enough numbers, and that happens if you deal with numbers that are really big; I can immediately tell from the filesize that it wouldn't make any sense here (for instance, I could address each byte with an integer).

The next thing to understand is which "words" this file has. Luckily, humans are really good at recognising patterns. So with a bit of luck, you just see the sections that describe something, like all floats for coordinates, for example. Three of these floats would be a word. So you recognize the section, count the bytes and divide that by 3, and if you end up with a multiple of 4, chances are that you've guessed right because a float takes 4 bytes, a coordinate component is a float and each coordinate has 3 components.

There's a freeware Windows HEX editor called XVI32 which makes this particularly easy; all you need to do is to change the window width, which makes XVI32 align the bytes newly, and you can immediately see patterns if you just hit the right width.

After you "see" what words there are, you want to say which word is what data. You're really just making educated guesses. To make educated guesses, you have to think about what the file should generally describe. For example, an animation has keyframes, some sort of speed value per frame, each keyframe says where which skeleton joint is et cetera. The complete keyframe would be a sentence, and its speed value would be a word, and the 3D coordinate for a skeletal vertex would be another word. At first, I coloured sections of a HEX editor screenshot in a painting program. But that took too much effort and wasn't immediate enough. So what I ended up doing is I printed out files in their HEX form multiple times and then wrote stuff onto such a printout, marked sections and so on. This proved to be essential to my productivity for numerous reasons, but it's a personal preference, and others might choose other methods. I am left-handed, so that might be relevant here.

If you have the words and understand what they are, you're pretty much done. You just need to understand what sections there are and how big they're supposed to be, and that should be very easy because almost always, recognizing the words involves recognizing the sections. The next thing you want to do is to write something that generates such a file. In the case of Lugaru 3D models, I just went straight ahead and wrote a converter in Java. It converted DirectX .x files to Lugaru .solid files and was aptly named X2Solid.

This converter wrote static texture coordinates at first, and scaling was completely off for everything but models that don't move (Lugaru scales bodies, immobile things and weapons differently). Indexing wasn't really right, there were some misunderstandings, and I could just make stuff from quads.

But it worked. The first time Lugaru displayed a custom 3D model I've modeled in Blender was very magical. The good ol' teapot didn't convert, so I modeled a cross; I'm not religious at all, it was just the first thing I could come up with.

Unfortunately, the first screenshot of this is lost. A screenshot of the second model however still exists.

A town. Box.solid was replaced with a converted custom 3D model.
A town. Box.solid was replaced with a converted custom 3D model.

Seeing that there's really a way to do this resulted in me putting a lot of time and effort into making it work well. So I fixed the errors and made it more robust. I gathered user input to make things work how they wanted it to. Soon enough, the community adopted my converter.

The converter evolved from a Java commandline converter named X2Solid to a Java application called Brainfart, and eventually, I just wrote a Python plugin for Blender because that's the best way to get it working cross-platform and to empower artists. The cross-platform promise of Java is a lie, I had to find that out the hard way, and the questions regarding Java installation and claims that OS X just didn't start it, together with abysmal performance on Linux, were really discouraging. Cross-platform should mean "build once, run anywhere with no adjustments". It just doesn't work out that way in Java's case though.

After seeing what people could do with this initial converter, I set out to cover all other assets as well. Skeletons, animations and maps should be editable in Blender, and Lugaru should run with them.

Apart from maps, I made that happen. Just a few days ago, I finished an Import/Export pair of Blender Python plugins for Lugaru animations. You could say that I've learned to speak to Lugaru, and over the years, we've become good friends.

You can get the open-source tools Wolf wrote at his site, check out Lotus Wolf's awesome custom weapons or contribute creations in the Lugaru section of the Wolfire forums.