Game File Systems: File Paths

Add comment!

May 25th, 2012

Recently I've put a lot of time into the Overgrowth file system, so I would like to share the best practices I've found for efficient file management on Mac, Windows and Linux. This is a pretty broad topic, so I'll break it into parts. In this first part, we can look at file paths. That might sound like a trivial topic, but there are actually a surprising number of edge cases to worry about! Here are all of the file path concerns I've run into, from simplest to most complicated.

Slash Direction

On Unix-based systems like Linux or Mac OS X, file path directories are separated by the forward slash character '/', like this:
"/Users/David/Wolfire/Project/Overgrowth.xcodeproj". Conversely, Windows file path directories are separated with the backslash character '', like this: "C:\Users\David\Wolfire\Project\Overgrowth.vcproj". If we try to open a file on a Unix-based system using the Windows backslash, it will report that the file can't be found!

However, if we open a file on a Windows system using the Unix forward slash, it will work just fine. Since forward slashes work on Windows, Mac OS X, and Linux, it's safer to always use them instead of backslashes, and convert the backslashes to forward slashes whenever we request a file path from Windows.

Case Sensitivity

Most Mac and Windows filesystems are not case-sensitive, so if we have a file at "Data/flame.tga", then we can open it with a request for "Data/Flame.tga" or even "DaTa/FlAmE.TGa". If our development machine is not case-sensitive, it's easy for case-incorrect file paths to slip in unnoticed. Then when someone tries to play our game on a case-sensitive file system, it will fail!

There are several solutions to this problem. The simplest solution is to just make all of our file paths are lower-case, and enforce that in code (e.g. by displaying an error if our file-loading code sees an upper-case character). Another solution is to completely bypass the native filesystem, so it doesn't matter if it's case-sensitive or not. For example, we can store all of our files in a custom archive like a .zip file, and load it using something like the PhysicsFS library.

But what if we want to open loose files, and don't want to enforce lower-case? Then we can work around the case-sensitivity of the native filesystem. We can simulate case-insensitivity by walking through directory structures and doing case-insensitive comparisons (example code), and then it doesn't matter if our paths are case-correct.

Conversely, we can simulate case-sensitivity by double-checking every path with the native filesystem path, making sure that they match (example code). Overgrowth currently uses both methods: it will display an error message if the case is incorrect, but it will still work if we click through it.

Working Directories

Users will usually open our game using a shortcut: from the desktop, dock, start menu, or other launcher. If the user opens the desktop shortcut, then the game might try to load a file like "Data/Flame.tga", and report that "Users/USERNAME/Desktop/Data/Flame.tga" does not exist. Of course it doesn't, that's the wrong path!

This happens because shortcuts can change our working directory (the directory that is implicitly prepended to relative paths). Typically we want it to always be something like "Programs/APPNAME/", so that it looks for "Data/Flame.tga" in "Programs/APPNAME/Data/Flame.tga".

To make sure we always have the right working directory, we can manually find the path to the executable, and set the working directory using "chdir". On Windows we can find the executable path using GetModuleFileName(), which will give us something like: "C:\Program Files (x86)\Wolfire\Overgrowth\Overgrowth.exe", and then we can just cut off the end to get: "C:\Program Files (x86)\Wolfire\Overgrowth\", and set that to be our working directory.

On Mac or Linux we search the command-line arguments and $PATH to find the working directory, but it looks like readlink() might be a simpler way to do it (see this Stack Overflow thread).

Edit: On Mac OS X you can use [[NSBundle mainBundle] resourcePath] to get a path to the executable, and on Linux you can usually use readSymLink("/proc/self/exe").

Write Permissions

Once upon a time, we could just create saved-game files and config files within our game directory, and that was that. On modern operating systems, we can't do that anymore, because the OS does not typically give us permission to write to the program directories. Now each OS has a different location that it expects programs to write documents to. For example, on Mac OS X, these files are usually written to "Users/USERNAME/Library/Application Support/APPNAME/". On Windows, they go to "Users/USERNAME/Documents/APPNAME/". On Linux, they just go to "Users/.APPNAME/".

We can handle this by creating a string to store the write directory for the current OS, and prepending this string to the beginning of every file write that we perform. On Windows we can use SHGetFolderPath(... CSIDL_PERSONAL ...) to get "Users/USERNAME/Documents/", and add "Wolfire/Overgrowth/" to get the complete write string. Then if we want to write to "Data/config.txt", we just add "Users/USERNAME/Documents/Wolfire/Overgrowth/" to the front, to get the complete write path: "Users/USERNAME/Documents/Wolfire/Overgrowth/Data/config.txt"

On Mac we can get the "Users/USERNAME/" from getenv("HOME"), and then just add "Library/Application Support/Overgrowth/". On Linux we can also use getenv("HOME"), and just add ".overgrowth/".

Edit: On Linux it is preferable to use getenv("XDG_DATA_HOME") + "/Overgrowth", and if that's not available, getenv("HOME") + ".local/share" + "/Overgrowth".

Unicode Paths

It's easy to just ignore non-ASCII characters. If all our data files and directories just use simple English text then we should be safe, right? Wrong! Remember all those USERNAME macros we have to deal with to get write directories? Those could easily be something like 成龍. If we try to write to "Users/成龍/Documents/Wolfire/Overgrowth/config.txt" on Windows without dealing with Unicode, then we'll end up creating a whole new User directory called "??" as we write to "Users/??/Documents/Wolfire/Overgrowth/config.txt".

For those not familiar with Unicode, here's a very brief overview. Unicode is a text encoding that provides a unique index for every character in every language. For example, 成龍 is character #25104 followed by character #40845. Because there are so many unique characters, we need to have an array of 32-bit ints to store them, if we want exactly one character in each array slot. This encoding is called UTF-32 (short for "Unicode Transformation Format - 32-bit").

Conveniently, the ASCII characters are unchanged in Unicode, so strings like "Data/Flame.tga" or "Hello World!\n" are the same in ASCII and in Unicode. This means that if we have existing functions that convert backslashes to forward slashes or capitalize ASCII characters, they should still work pretty well with UTF-32.

So far this is pretty simple -- we just have to use int strings instead of char strings, or std::basic_strings instead of std::strings. No big deal! However, things get more complicated when passing these strings to and from the operating system calls. This is because none of the major operating systems actually accept or provide UTF-32 strings for file paths: Mac and Linux use UTF-8, and Windows uses UTF-16.

As you might have guessed, UTF-8 uses 8-bit values, and UTF-16 uses 16-bit values. But how can we fit a character like 成 (#25104) into an 8-bit value? We can't, it actually converts to a sequence of three UTF-8 values: (#230 #130 #132). Some characters use four UTF-8 values, and some only use one (namely, the 7-bit ASCII characters). In short, you can think of UTF-8 and UTF-16 as variable-length lossless compression formats for UTF-32. If you want to convert from one format to another in C++, UTF-8 CPP is a decent light-weight library.

In Overgrowth, we use UTF-32 strings to manipulate file path strings, and convert them to and from UTF-8 for Mac and Linux API calls, and to and from UTF-16 for Windows API calls. The Mac and Linux calls like fopen(), readpath(), and so on, are unchanged for UTF-8. In fact, they always use UTF-8, and ASCII users don't notice because the common characters are unchanged. The Windows calls, on the other hand, use variations like _wfopen() or GetModuleFileNameW() to specify that they use "wide chars", which are UTF-16 on Windows.

Conclusion

These are all of the file path issues that I've run into while working on Overgrowth. Planned topics for future posts include caching processed files, improving load speed, and choosing when to load files. Please let me know if there are any other topics you'd like to see, or if you've found better solutions to these file path issues!