02-04-2010
Here are some computer language memes for your enjoyment. :)



01-07-2010
As far as I can tell, many of the readers of my blog find it through my posts on a couple of mailing lists: the sweng-gamedev@midnightryder.com game development mailing list, and the Boost developer mailing list.
I've blogged a few times about shared_ptr in the hopes of bringing those two audiences to appreciate the exclusive low level, C-style features of shared_ptr. The Boost crowd tends to think of shared_ptr as of any other smart pointer which simply calls delete automatically. The typical game development programmer basically ignores shared_ptr altogether, assuming that all it can do is call delete automatically (and they know when to call delete, thank you very much!)
I'll continue illustrating my point by sketching an interface design for socket programming using shared_ptr, based entirely on the standard C socket API structs.
The typical C++ approach would be to create an "object-oriented" system of classes, with all kinds of encapsulation and safety-net goodness throughout -- yet such interfaces have a fundamental problem. The whole point of defining a C++ socket programming layer is to provide type-safety for the low-level int handles. But to make the high level C++ interface practical, we have to punch a hole in the type system by providing a "getter" (and even a "setter") for the low level int handle!
And this means that we can no longer rely on type safety.
The important thing to consider is that the C interface is standard: sockets are represented by ints and that's that. The only reasonable way to improve the C API is make it easier to use, and harder to make silly errors. Below, I'm providing a few simple C++ functions which do just that.
First, here is how shared_ptr can be used to take care of closing a socket file descriptor when it is no longer needed:
The socket_wrapper class is an internal type, hidden in a CPP file where socket_adopt is defined. That function is intended to be called as soon as a socket file descriptor is returned from the standard C socket API, or from any other 3rd-party socket interface that opens a socket. It simply uses the shared_ptr aliasing constructor to rebind the initial shared_ptr<socket_wrapper> to point to the int stored in the socket_wrapper object. When the last shared_ptr instance expires ~socket_wrapper() will be called automatically, yet the user sees a simple shared_ptr<int const> which points the socket file descriptor!
The socket_create function below demonstrates how socket_adopt can be used:
The old school gethostbyname function is now superseded by the POSIX function getaddrinfo. It is used to convert DNS names and IP addresses from their human-readable text form to a structured binary format for the OS:
The caller passes an address and describes the type of acceptable protocols, and the function returns a linked list of addrinfo objects in res. Each returned addrinfo object's ai_family, ai_socktype and ai_protocol members can be passed to the socket() function (or to socket_create above) to open a matching new socket object. The linked list is allocated dynamically by the OS and must be disposed by calling freeaddrinfo. As before, we define a simple wrapper function that returns shared_ptr:
Remember, the returned shared_ptr object points to a linked list of addrinfo objects. We could keep the original shared_ptr around and walk the nodes using raw pointers, or we can advance the shared_ptr itself to the next node:
The above function again takes advantage of the shared_ptr aliasing constructor. The returned shared_ptr object points to the next node in the addrinfo list, but when freeaddrinfo is called, it will be given the original addrinfo pointer, the one passed to the original shared_ptr object in socket_getaddrinfo.
The obvious benefit of using shared_ptr<int const> to manage the lifetime of sockets and file descriptors is that it decouples us from having to know how the file descriptor should be destroyed.
It is also possible to use type erasure to convert the shared_ptr<int const> to a shared_ptr<void const> and pass it to some kind of system which is concerned only with objects lifetime management regardless of their type.
Finally, shared_ptr comes with weak_ptr support. One example where this could be handy is in a logging system, which could easily detect if the log file has been closed by the application before attempting to write something in it.
11-19-2009
As I'm typing this post, there is a continuing discussion on the Boost developers mailing list about what to do with compile warnings reported in Boost code. There seem to be two opinions on the subject.
In the blue corner, we have people arguing that the compiler is your friend and if it tells you that there might be a problem lurking in your code, you should appreciate that and take whatever action is needed to "fix" the warning.
In the red corner, people argue that warnings don't necessarily indicate problems, that some compilers are plain silly, and that each Boost developer should be free to pick whatever warnings level is appropriate for them.
But before anyone has an opinion on the subject of warnings, we need to be on the same page about
A common misconception is that compiler warnings indicate problems in the program that are not as severe as errors but should nevertheless be "fixed".
Actually, warnings report facts about the program being compiled for which the C/C++ standard does not require the compiler to issue an error. We can classify them in the following categories:
In a corporate environment, such classification helps managers craft a sensible warnings policy that maximizes the output of a particular development team targeting a particular set of platforms.
For a project like Boost, realistically, the classification is much simpler:
Except that by definition 1) is a subset of 2), since many Boost users compile in environments that enable all warnings.
Therefore, we're left with
which is another way of saying "any warning issued by any version of any compiler on any platform Boost is compiled on."
Realistically, the only sensible way to achieve this goal is to simply suppress (disable) all warnings in Boost releases, preferably without disabling them for Boost developers:
The only alternative is to require Boost developers to "fix" all warnings on all versions of all compilers on all platforms Boost is compiled on; (obviously?) "fixing" only some warnings that a committee of some sort labeled as "important" still requires disabling all (other) warnings, assuming we are committed to provide warnings-free user experience.
And if that's not our goal, then we should be satisfied with the statu quo: Boost developers address most warnings reported by Boost users.
10-26-2009
I'm currently working on a library in CUDA that implements the basic operations for dynamic, very large size vectors and matrices. You'd think that such a basic scientific library should be already available, but I haven't been able to find one (if you know of such a CUDA/GLSL/HLSL library please do let me know!)
Most scientist find it unacceptable to have to even think about matrix multiplication. Basic things like that should just work, and so LAPACK (which of course does a lot more than multiplying matrices) was developed many years ago to solve all of these problems once and for all.
Unfortunately LAPACK is incompatible with the current GPU platforms.
LAPACK is a software library for numerical linear algebra. It is implemented in FORTRAN but bindings to virtually any programming language are available.
Because LAPACK is a rather large piece of software, it defines a minimal set of lower level functions over which all LAPACK operations are implemented. They've named those functions BLAS, or Basic Linear Algebra Subprograms. The idea is that LAPACK would run efficiently on any platform that implements BLAS efficiently.
We don't need to dig any deeper than the Wikipedia LAPACK page to see what the problem is:
Current GPU architectures are anything but cache-based architectures. Moreover, they can't efficiently share data with the CPU. So a BLAS implementation that targets a current GPU platform has no choice but to copy data to GPU memory, perform the operation (very efficiently) on the GPU, then copy the result back to CPU memory for LAPACK use.
The net effect is that while a GPU architecture can multiply large matrices way faster than a CPU architecture, the required double memory copy cripples cuBLAS (the CUDA-based BLAS implementation from NVIDIA) and, well, makes it almost useless. I'm sure there are cases where it would outperform a simple CPU-based BLAS implementation, but I suspect that a SSE2-enabled BLAS will run circles around cuBLAS with any data set.
There are two ways this issue can be resolved. One, the scientific community could move away from LAPACK. Two, the GPU platforms can become friendlier to the LAPACK/BLAS architecture.
It would be difficult for the scientific community to abandon LAPACK. Perhaps they can be persuaded by the huge speedups they would get by targeting GPU platforms (through a different API, not LAPACK), but the problem is not only inertia: there is a lot of scientific code that will have to be rewritten if it is to be ported from LAPACK to anything else.
There is some hope for LAPACK in the future though: Intel's Larrabee seems to address all problems that cripple BLAS, yet promises to be able to run massively parallel programs as efficiently as current GPUs do. The GPU companies are also slowly moving away from a completely cacheless model, but this treads on CPU territory and the thing is, a laptop only needs one chip that can run "CPU code" efficiently.
Massive data computation is memory bandwidth bound. This is even more true on GPU platforms which are designed to benefit from memory access latency to maximize bandwidth.
Currently, a good optimization strategy is to squeeze more things into available bandwidth. This is especially easy to do in graphics and image processing: depending on your requirements, you could use a variety of pixel formats, and spending fewer bits per pixel is an easy performance booster.
Note that even when input images are of reduced quality, all processing and the produced results can still use high precision without increasing (input) bandwidth. Video games use this technique a lot: typically they produce high quality picture, yet some of the input textures can use as few as 2 bits per pixel.
The same should be possible when working with large vectors and matrices as well: if you know that your input or intermediate or output data doesn't need the full 32-bit floating point precision, you should be able to easily use 16-bit floats. This gives you 2x speedup with no effort at all.
As well, there are use cases for large matrices or vectors that have all elements either 0.0 or 1.0, so even one-bit-per-element formats should be supported. Though such cases aren't very common, a 32x speedup should not be underestimated.
10-20-2009
I am guessing there is some trace of "proper" game programming mentality in me after all. :) Going against my previous rant, I've designed "yet another" vector/matrix math library and I've submitted it for a preliminary Boost review.
Why do I think that writing this library was a good idea? Because it makes no sense for any of us to have to spell out how a 3x3 matrix is multiplied by another 3x3 matrix; there should be a way to express that algorithm generically and apply it to any and all 3x3 matrix types in the world.
The only tricky part is that operations such as matrix multiplication should use operator overloads (seriously, it's retarded not to) and that presents a challenge: how do you define type-safe operator overloads without using specific matrix types? That's basically what (Boost) LA pulls off, using SFINAE.
A user-defined vector type float3 can be introduced to (Boost) LA like this:
After a similar specialization of the matrix_traits template for a user-defined 3x3 matrix type float33, a full range of vector and matrix operations defined in (Boost) LA headers become available automatically:
The full documentation and source code, released under the Boost Software License, is here.
09-14-2009
It is only natural for people to think with nostalgia about the good 'ol days: "oh man, back in the day things were so simple!" Objectively speaking, obviously, the "good 'ol days" sucked: that simplicity is only skin deep.
Yes, a telephone was much easier to use in the 70s, but then again it didn't do that much, did it? Could you imagine having to memorize the phone numbers of everyone you need to text or call? Right, that would have sucked. How about having to fast forward in order to get to the next song on a compact cassette? Ouch.
Yet, some things from the past are worth admiring. To me, a steam engine is prettier than the most advanced internal combustion engines of today. Not really practical, yet pretty.
But I'm a programmer, and a lot of the old stuff I admire is software. And while I have (if only once) programmed a computer with punchcards, those were already antiques when I was in school; the computer I grew up programming was the Apple ][.
So here we go: at the risk of wasting my time raving about something most programmers today can't appreciate, I'll try to explain but just a tiny bit of Wozniak's brilliancy:
Some background: on physical magnetic media, you can't "just record data". Instead, 1 is encoded as a magnetic polarity transition (also known as flux reversal), and 0 is encoded as, well, lack of transition. Basically, for each bit there is a limited time window in which flux reversal is expected to occur. If it occurs, the bit is 1, otherwise it's 0. The problem is that the time window is so small that there is a limit on the accuracy with which the data window length can be measured. So, while detecting a series of ones is not a problem, it's difficult to know how many zeroes are encoded in a measured period of lack of flux reversal.
Wozniak's original hardware design imposed two constraints on the raw data:
However, using simple FM encoding "wastes" 4 bits per raw byte, which would allow only 10 256-byte sectors per track to be recorded. Instead, he devised a more complex encoding scheme that was based on the fact that there are 34 8-bit numbers which have the top bit set and no two 0-bits in a row. That way, 13 sectors per track could be recorded. Later on the hardware design of the floppy disk controller was modified to allow no more than one occurrence of two consecutive 0-bits in a byte, which lead to different encoding called 6&2; it allowed 16 sectors per track.
A conventional floppy disk controller design would put all this bit twiddling in silicone which just DMAs the decoded bytes in memory -- but it would cost more in hardware. So what is a Wozniak to do? Obviously: screw hardware, do everything in software.
That would have been impressive enough on a 6502 CPU, but even more impressive is what is required of the bootstrapping routine. It must:
...all in no more than 256 bytes of code, which is the addressing limit for the ROM on Apple ][ expansion cards.
By the way, there are no timers on the Apple ][, all timing is based on the CPU clock, e.g. you'd know that an INX instruction takes 2 microseconds to execute. The 6502 has three 8-bit registers: an accumulator A which can add, subtract, shift, rotate, and, or, xor, etc., and two 8-bit index registers X and Y which can do none of those operations. :)
09-07-2009
Last week, I found myself on a game development mailing list, arguing about using boost::shared_ptr. Such discussions go something like this:
GameDeveloper1: What about smart pointers? Emil: shared_ptr pwns!!!11~`~1 GameDeveloper2: It sucks, it allocates two memory blocks! Emil: Not if you use allocators. GameDeveloper2: Bleah, allocators suck, they're intrusive! Emil: Not the way shared_ptr uses them. GameDeveloper2: Oh. Well it sucks anyway, it takes the space of two raw pointers! And I've been developing games for 15 years, never needed a smart pointer! Ha!
In the last week's discussion someone asked me if I could show in concrete code how it's possible to have a shared_ptr factory which is not intrusive with regard to the type of the objects it creates, yet is able to place the shared_ptr control block in the same memory block as the object (indeed, non-intrusiveness is an important requirement because factories often operate behind a DLL boundary.)
Let's start with the factory interface. This is a complete header file, it doesn't need to #include any headers:
I'll paste the full implementation of the make_shared_int function at the end of this post, here I will describe the 4 allocation strategies it chooses from:
Here is the complete source code of int_factory.cpp:
See also: shared_ptr Programming Techniques.
08-21-2009
I didn't really plan to become this focused on error handling. Who would want that? Dealing with errors is frustrating, difficult to test, and takes your attention away from all the action! But I have to admit, whatever the reason is, I do care about errors a lot; perhaps too much (though this doesn't mean that I find it any less frustrating than anyone else.)
In C, there is an obvious difficulty in reporting errors. The typical approach is to return an error code, but the problem is that this takes away the return value of the function, so any function that could fail can't return a result in case of success. You may think that this complicates the use of the API but the reality is that an API that reports errors by returning error codes is already difficult to use. At some point, you realize that writing
is just silly. And then if you're not stuck in C, you discover that destructors in C++ take care of freeing resources automatically, and that exception handling takes care of the "if(err) return err;" you have to sprinkle all over the place in a C program. So you're left with the straight forward
That said, I do like plain old C APIs. One of the reasons why I don't mind their code-based error reporting is that wrapping a C function is the easiest thing to do.
Here is how a C function -- even one that creates an object that needs to be deleted later -- can be wrapped to throw on error:
While this might seem a lot of code, it really isn't: all of the included Boost headers are very lean. Also, consider what the header file for my_fopen looks like:
It might be argued that wrapping every C function that could fail in this manner is rather tedious. I'd admit that this is a significant downside, but consider the upside: in any exception-neutral context (anywhere you'd have to sprinkle the "if(err) return err;"), you don't have to worry about errors! And the thing is, most contexts in a program are exception-neutral indeed.
But if the reader remembers, the topic of this post was
So how do they deal with errors in Maya? Here is an example of the member funciton MDagPath::hasFn:
That's just awesome, I get to decide if I want the API to tell me if the function I call succeeded. :) I don't want to sound cocky but seriously, I'm wondering if they even realize how big of a deal this is -- they openly invite everyone to ignore errors! They went the extra mile to make it easier for the programmers to be sloppy!
Should I care that much about other people writing sloppy code? After all, I could wrap the Maya API the usual way and convert failures to exceptions, right?
Two problems with that:
First, a big and complicated program like Maya has its own character, its own culture if you will. Even if my code deals with errors correctly, all the other code in the system is so much more that it makes my efforts to do the right thing silly. The net effect of the Maya error handling strategy is that failures are dealt with only to the point of not causing a crash. How does the user know if an operation was successful? The best way to figure this out that I've found is 1) look if it seems to have done what it was supposed to do; 2) if not, look in the Maya console for an error message; 3) if all fails, ask a more experienced Maya user.
The second problem is actually very annoying. Turns out they also do the default MStatus pointer trick in constructors, including in constructors of non-copyable types. Here's one of the MFnCamera constructors:
(Btw, if you're a fan of NULL, you should read this.)
The thing is, while wrapping a regular (member) function and making it throw on error is trivial, wrapping such a constructor is impossible: you really have to wrap the entire type, by deriving from MFnCamera.
And then you proceed to write that derived type's constructor. Naturally, the wrapper's constructor doesn't take an MStatus argument, because it throws on error. But what can it pass as a ReturnStatus pointer to the original constructor?
It could pass a pointer to a global object, except that Maya is a multi-threaded application. It definitely can't pass the address of a MyFnCamera member, because those are initialized after the base object. So the only way is to make MyFnCamera derive from MStatus, in addition to MFnCamera.
It would have been so much simpler if Maya was a plain old C API, wouldn't it!
08-12-2009
I haven't posted much on my blog lately, one of the reasons is that I was busy traveling. Besides an interesting trip to Philadelphia for a meeting with Comcast, I presented a 3-day seminar on parallel programming using CUDA at the Interdisciplinary Mathematics Institute, University of South Carolina.
The materials from the seminar are now available for browsing and downloading here.
06-07-2009
At BoostCon, after my Boost Exception talk someone raised the question what would happen if adding error_info to a boost::exception throws an exception. The same issue was also recently discussed on the Boost mailing list.
A bit of information for readers not familiar with Boost Exception. This is a Boost Library which provides a base exception type boost::exception which defines a type-safe interface for adding and retrieving any information in exception objects that derive from it. This can be done directly in the throw expression:
Error info can also be added at a later time to an active exception object (consider that in some contexts the file name is not available at the time of the throw) :
The issue at hand is that if you get a (possibly std::bad_alloc) exception at (1), this new exception would propagate instead of file_open_error. At (2), the intention is to re-throw the original exception object, but another exception could be emitted by the e << file_name("foo.txt") expression.
People concerned with this issue propose changing the specification of boost::error_info and operator<< to guarantee nothrow behavior, by ignoring the possible failure. The rationale for this request is that perhaps it is healthier for the program to catch the exception it expects -- even without the additional information -- instead of some other (unrelated?) exception.
There is a simple argument to be made why the current Boost Exception behavior is correct: dealing with failures is difficult. Testing code that deals with failures is even more difficult. Right? What if the system designed to deal with errors fails? Isn't this a critical situation? Isn't this more important than any other error?
If your hair is on fire, should you really worry about something like a file_open_error? :)

Dexter from "Dexter's Laboratory" running in circles around his sister Dee Dee, screaming "MY HAIR IS ON FIRE MY HAIR IS ON FIRE!!"
There is some logic to saying that if a particular error_info is missing, the application could reasonably deal with the problem. However, we're asking the application to do a lot more.
Ignoring failures to add error_info to exceptions means that the application must be able to deal with any exception based on its type only. We'd also be expecting the application to deal with different permutations of availability of any error_info in exceptions it catches. That's because most containers -- including the one used internally by boost::exception -- provide only basic exception safety guarantee, which in general means that failing to add error_info to an exception could leave the container in any state whatsoever; all we know is that we won't get a resource leak.
It's possible to use a container with strong exception safety guarantee, combined with ignoring failures to add error_info. This would mean that any error_info already successfully added to the exception would not disappear if an attempt to add another error_info at a later time fails. We could at least add error_info in order of importance.
But this doesn't improve our worst case scenario: the program would still be required to deal with any exception only knowing its type.
Secondly, it is too simplistic to only consider the possibility of running out of memory when adding error_info to exception objects: depending on the complexity of the throw expression, we could get other exceptions as well. Consider this snippet:
The system could run out of memory in the get_file_name function. In this case (obviously?) bad_alloc should propagate and not file_open_error.
Ultimately, even if we ignore all of the issues above, if our motivation was to guarantee that a throw expression, even something as simple as:
will necessarily result in a file_open_error being propagated, we'd be out of luck.
That's because throwing an exception needs memory for the exception object. The C++ standard does not specify how this memory is to be allocated. The only requirement for the implementation is to have enough spare memory to be able to throw a std::bad_alloc, but it is possible to run out of memory when attempting to throw another exception. In that case, the compiler is allowed to throw std::bad_alloc instead.