Search

ReCode /

EMIL DOTCHEVSKI'S PROGRAMMING BLOG

Computer Language Memes

02-04-2010

Here are some computer language memes for your enjoyment. :)


Type safety as seen by fans of value-typed languages


Languages as seen by fans of other languages


Microsoft languages as seen by fans of other Microsoft languages


Post a comment


Socket programming with shared_ptr

01-07-2010

As far as I can tell, many of the readers of my blog find it through my posts on a couple of mailing lists: the sweng-gamedev@midnightryder.com game development mailing list, and the Boost developer mailing list.

I've blogged a few times about shared_ptr in the hopes of bringing those two audiences to appreciate the exclusive low level, C-style features of shared_ptr. The Boost crowd tends to think of shared_ptr as of any other smart pointer which simply calls delete automatically. The typical game development programmer basically ignores shared_ptr altogether, assuming that all it can do is call delete automatically (and they know when to call delete, thank you very much!)

I'll continue illustrating my point by sketching an interface design for socket programming using shared_ptr, based entirely on the standard C socket API structs.

The typical C++ approach would be to create an "object-oriented" system of classes, with all kinds of encapsulation and safety-net goodness throughout -- yet such interfaces have a fundamental problem. The whole point of defining a C++ socket programming layer is to provide type-safety for the low-level int handles. But to make the high level C++ interface practical, we have to punch a hole in the type system by providing a "getter" (and even a "setter") for the low level int handle!

And this means that we can no longer rely on type safety.

The important thing to consider is that the C interface is standard: sockets are represented by ints and that's that. The only reasonable way to improve the C API is make it easier to use, and harder to make silly errors. Below, I'm providing a few simple C++ functions which do just that.

Opening and closing sockets

First, here is how shared_ptr can be used to take care of closing a socket file descriptor when it is no longer needed:

namespace
{
  class socket_wrapper
  {
    socket_wrapper( socket_wrapper const & );
    socket_wrapper & operator=( socket_wrapper const & );
  public:
    int const s_;
    explicit socket_wrapper( int s ): s_(s) { }
    ~socket_wrapper()
    {
      (void) shutdown(s_,SHUT_RDWR);
      (void) close(s_);
    }
  }
}

boost::shared_ptr<int const> socket_adopt( int s )
{
  boost::shared_ptr<socket_wrapper> sw( new socket_wrapper(s) );
  return boost::shared_ptr<int const>(sw,&sw->s_);
}

The socket_wrapper class is an internal type, hidden in a CPP file where socket_adopt is defined. That function is intended to be called as soon as a socket file descriptor is returned from the standard C socket API, or from any other 3rd-party socket interface that opens a socket. It simply uses the shared_ptr aliasing constructor to rebind the initial shared_ptr<socket_wrapper> to point to the int stored in the socket_wrapper object. When the last shared_ptr instance expires ~socket_wrapper() will be called automatically, yet the user sees a simple shared_ptr<int const> which points the socket file descriptor!

The socket_create function below demonstrates how socket_adopt can be used:

boost::shared_ptr<int const> socket_create( int domain, int type, int protocol )
{
  int s = socket(domain,type,protocol);
  if( s!=-1 )
    return socket_adopt(s);
  else
    throw socket_error();
}

getaddrinfo

The old school gethostbyname function is now superseded by the POSIX function getaddrinfo. It is used to convert DNS names and IP addresses from their human-readable text form to a structured binary format for the OS:

int getaddrinfo(
  const char const * node,     // e.g. "www.example.com" or IP
  const char const * service,  // e.g. "http" or port number
  addrinfo const * hints,
  addrinfo * * res );

The caller passes an address and describes the type of acceptable protocols, and the function returns a linked list of addrinfo objects in res. Each returned addrinfo object's ai_family, ai_socktype and ai_protocol members can be passed to the socket() function (or to socket_create above) to open a matching new socket object. The linked list is allocated dynamically by the OS and must be disposed by calling freeaddrinfo. As before, we define a simple wrapper function that returns shared_ptr:

boost::shared_ptr<addrinfo const> socket_getaddrinfo( char const * node, char const * service, addrinfo const & hints )
{
  addrinfo * servinfo;
  if( int status=getaddrinfo(node,service,&hints,&servinfo) )
    throw socket_error();
  else
    return boost::shared_ptr<addrinfo>(servinfo,freeaddrinfo);
}

Remember, the returned shared_ptr object points to a linked list of addrinfo objects. We could keep the original shared_ptr around and walk the nodes using raw pointers, or we can advance the shared_ptr itself to the next node:

boost::shared_ptr<addrinfo const> next( boost::shared_ptr<addrinfo const> const & node )
{
  return boost::shared_ptr<addrinfo const>(node,node->ai_next);
}

The above function again takes advantage of the shared_ptr aliasing constructor. The returned shared_ptr object points to the next node in the addrinfo list, but when freeaddrinfo is called, it will be given the original addrinfo pointer, the one passed to the original shared_ptr object in socket_getaddrinfo.

Conclusion

The obvious benefit of using shared_ptr<int const> to manage the lifetime of sockets and file descriptors is that it decouples us from having to know how the file descriptor should be destroyed.

It is also possible to use type erasure to convert the shared_ptr<int const> to a shared_ptr<void const> and pass it to some kind of system which is concerned only with objects lifetime management regardless of their type.

Finally, shared_ptr comes with weak_ptr support. One example where this could be handy is in a logging system, which could easily detect if the log file has been closed by the application before attempting to write something in it.

Marshall — 07 January 2010, 17:46

That's a form of shared_ptr that I've never used before (heck, never _seen_ before); the sharing ownership with another shared pointer.

Cool!

Arseny Kapoulkine — 08 January 2010, 02:32

Alas, sockets in real world are not standard, even between win/*nix. The differences are not that great but they're enough for your code to not work :)

Post a comment


Warnings in Boost

11-19-2009

As I'm typing this post, there is a continuing discussion on the Boost developers mailing list about what to do with compile warnings reported in Boost code. There seem to be two opinions on the subject.

In the blue corner, we have people arguing that the compiler is your friend and if it tells you that there might be a problem lurking in your code, you should appreciate that and take whatever action is needed to "fix" the warning.

In the red corner, people argue that warnings don't necessarily indicate problems, that some compilers are plain silly, and that each Boost developer should be free to pick whatever warnings level is appropriate for them.

But before anyone has an opinion on the subject of warnings, we need to be on the same page about

What warnings are (and what they are not)

A common misconception is that compiler warnings indicate problems in the program that are not as severe as errors but should nevertheless be "fixed".

Actually, warnings report facts about the program being compiled for which the C/C++ standard does not require the compiler to issue an error. We can classify them in the following categories:

  1. Warnings that indicate definite errors, such as when a printf format specifier doesn't match the type of the variable being passed.
  2. Warnings that alert us about possibly unintended semantics, such as tricky operator precedence or implicit type conversions.
  3. Warnings that help enforce "good" coding standards, for example messages that report the use of C-style casts.
  4. Warnings that push corporate interests, such as the infamous MSVC C4996.

In a corporate environment, such classification helps managers craft a sensible warnings policy that maximizes the output of a particular development team targeting a particular set of platforms.

For a project like Boost, realistically, the classification is much simpler:

  1. Warnings that a particular Boost developer finds useful, and
  2. Warnings that Boost users find annoying.

Except that by definition 1) is a subset of 2), since many Boost users compile in environments that enable all warnings.

Therefore, we're left with

  1. Warnings that users find annoying.

which is another way of saying "any warning issued by any version of any compiler on any platform Boost is compiled on."

Realistically, the only sensible way to achieve this goal is to simply suppress (disable) all warnings in Boost releases, preferably without disabling them for Boost developers:

#ifndef BOOST_ENABLE_COMPILE_WARNINGS
//use compiler-specific directives to disable all warnings
#endif

The only alternative is to require Boost developers to "fix" all warnings on all versions of all compilers on all platforms Boost is compiled on; (obviously?) "fixing" only some warnings that a committee of some sort labeled as "important" still requires disabling all (other) warnings, assuming we are committed to provide warnings-free user experience.

And if that's not our goal, then we should be satisfied with the statu quo: Boost developers address most warnings reported by Boost users.

Post a comment


LAPACK

10-26-2009

I'm currently working on a library in CUDA that implements the basic operations for dynamic, very large size vectors and matrices. You'd think that such a basic scientific library should be already available, but I haven't been able to find one (if you know of such a CUDA/GLSL/HLSL library please do let me know!)

Most scientist find it unacceptable to have to even think about matrix multiplication. Basic things like that should just work, and so LAPACK (which of course does a lot more than multiplying matrices) was developed many years ago to solve all of these problems once and for all.

Unfortunately LAPACK is incompatible with the current GPU platforms.

LAPACK and BLAS

LAPACK is a software library for numerical linear algebra. It is implemented in FORTRAN but bindings to virtually any programming language are available.

Because LAPACK is a rather large piece of software, it defines a minimal set of lower level functions over which all LAPACK operations are implemented. They've named those functions BLAS, or Basic Linear Algebra Subprograms. The idea is that LAPACK would run efficiently on any platform that implements BLAS efficiently.

Problem 1: memory model

We don't need to dig any deeper than the Wikipedia LAPACK page to see what the problem is:

"LAPACK can be seen as the successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK. LINPACK was designed to run on the then-modern vector computers with shared memory. LAPACK, in contrast, depends upon the Basic Linear Algebra Subprograms (BLAS) in order to effectively exploit the caches on modern cache-based architectures, and thus can run orders of magnitude faster than LINPACK on such machines, given a well-tuned BLAS implementation."

Current GPU architectures are anything but cache-based architectures. Moreover, they can't efficiently share data with the CPU. So a BLAS implementation that targets a current GPU platform has no choice but to copy data to GPU memory, perform the operation (very efficiently) on the GPU, then copy the result back to CPU memory for LAPACK use.

The net effect is that while a GPU architecture can multiply large matrices way faster than a CPU architecture, the required double memory copy cripples cuBLAS (the CUDA-based BLAS implementation from NVIDIA) and, well, makes it almost useless. I'm sure there are cases where it would outperform a simple CPU-based BLAS implementation, but I suspect that a SSE2-enabled BLAS will run circles around cuBLAS with any data set.

There are two ways this issue can be resolved. One, the scientific community could move away from LAPACK. Two, the GPU platforms can become friendlier to the LAPACK/BLAS architecture.

It would be difficult for the scientific community to abandon LAPACK. Perhaps they can be persuaded by the huge speedups they would get by targeting GPU platforms (through a different API, not LAPACK), but the problem is not only inertia: there is a lot of scientific code that will have to be rewritten if it is to be ported from LAPACK to anything else.

There is some hope for LAPACK in the future though: Intel's Larrabee seems to address all problems that cripple BLAS, yet promises to be able to run massively parallel programs as efficiently as current GPUs do. The GPU companies are also slowly moving away from a completely cacheless model, but this treads on CPU territory and the thing is, a laptop only needs one chip that can run "CPU code" efficiently.

Problem 2: bandwidth

Massive data computation is memory bandwidth bound. This is even more true on GPU platforms which are designed to benefit from memory access latency to maximize bandwidth.

Currently, a good optimization strategy is to squeeze more things into available bandwidth. This is especially easy to do in graphics and image processing: depending on your requirements, you could use a variety of pixel formats, and spending fewer bits per pixel is an easy performance booster.

Note that even when input images are of reduced quality, all processing and the produced results can still use high precision without increasing (input) bandwidth. Video games use this technique a lot: typically they produce high quality picture, yet some of the input textures can use as few as 2 bits per pixel.

The same should be possible when working with large vectors and matrices as well: if you know that your input or intermediate or output data doesn't need the full 32-bit floating point precision, you should be able to easily use 16-bit floats. This gives you 2x speedup with no effort at all.

As well, there are use cases for large matrices or vectors that have all elements either 0.0 or 1.0, so even one-bit-per-element formats should be supported. Though such cases aren't very common, a 32x speedup should not be underestimated.

RamV — 28 October 2009, 15:37

Did you take a look at MAGMA: http://icl.cs.utk.edu/magma/

Its still nascent, but it is an effort by the lin. algebra guys that spend so much time on LAPACK.

Emil — 28 October 2009, 19:55

Thanks for the link. I certainly didn't study the MAGMA documentation in details but it says "The performance relies on the performance of the CPU BLAS and GPU BLAS used." Which is puzzling.

Nathan Whitehead — 05 November 2009, 20:41

The key idea you're missing is that you can leave data on the gpu. You don't have to copy data to and from the gpu for every operation. The best strategy is to copy the problem input to the gpu, do lots of matrix operations and other stuff all on the gpu, then copy back the final results.

Emil — 06 November 2009, 11:06

No, you're reiterating my point. :)

What works well on the GPU is clear, the question is how compatible that is with LAPACK/BLAS. I've talked with several scientists who have tried to replace their BLAS implementation with cuBLAS, only to see their program running slower; that is the problem I'm talking about.

Kumar — 11 December 2009, 12:02

You might also want to check out CULA, which has a bunch of useful GPU accelerated routines: http://www.culatools.com

Post a comment


Boost LA

10-20-2009

I am guessing there is some trace of "proper" game programming mentality in me after all. :) Going against my previous rant, I've designed "yet another" vector/matrix math library and I've submitted it for a preliminary Boost review.

Why do I think that writing this library was a good idea? Because it makes no sense for any of us to have to spell out how a 3x3 matrix is multiplied by another 3x3 matrix; there should be a way to express that algorithm generically and apply it to any and all 3x3 matrix types in the world.

The only tricky part is that operations such as matrix multiplication should use operator overloads (seriously, it's retarded not to) and that presents a challenge: how do you define type-safe operator overloads without using specific matrix types? That's basically what (Boost) LA pulls off, using SFINAE.

The scoop

A user-defined vector type float3 can be introduced to (Boost) LA like this:

#include <boost/la/vector_traits.hpp> //Note: this library is NOT part of Boost

struct float3 { float a[3]; };

namespace
boost
    {
    namespace
    la
        {
        template <>
        struct
        vector_traits<float3>
            {
            static int const dim=3;
            typedef float scalar_type;

            template <int I> static inline scalar_type & w( float3 & v )
                { return v.a[I]; }
            template <int I> static inline scalar_type r( float3 const & v )
                { return v.a[I]; }

            static inline scalar_type & iw( int i, float3 & v )
                { return v.a[i]; }
            static inline scalar_type ir( int i, float3 const & v )
                { return v.a[i]; }
            };
        }
    }

After a similar specialization of the matrix_traits template for a user-defined 3x3 matrix type float33, a full range of vector and matrix operations defined in (Boost) LA headers become available automatically:

float3 v;
v|X = 0;
v|Y = 0;
v|Z = 7;
float vmag = magnitude(v);
float33 m = rotx_matrix<3>(3.14159f);
float3 vrot = m * v;

The full documentation and source code, released under the Boost Software License, is here.

Arseny Kapoulkine — 20 October 2009, 22:16

This suits boost very much, but is anything but proper. IMHO.

gemicha — 20 October 2009, 23:02

v|X = 0; .... You gotta be kidding me

Emil — 21 October 2009, 01:38

I had posted a somewhat sarcastic reply to gemicha, but I realize that his comment was not unreasonable with only this example in mind.

The expression v|X should be read as "v pipe X". It is logical part of a more general system of view proxies in Boost LA. For example, you could say m|col<2>|YXZ which gets you a (mutable) reference to column 2 of m with its X and Y components swapped.

Gemicha's comment also prompted me to write this rationale.

Alex — 02 January 2010, 22:07

I've experimented with using this library in a project, and so far I like what I see. I do have to agree that the pipe syntax is rather unusual. IMHO the following would be nicer syntax and accomplish the same stuff as detailed in the rationale:

v[X] = 0;  // before v|X = 0;
m2[col<1>][YXZ] = m1 * v; // before: m2|col<1>|YXZ = m1 * v;
m2[col<1>] = m1 * v[YXZ]; // no precedence issues. before: m2|col<1> = m1 * (v|YXZ);

This would, of course, be accomplished by overloading the [] operator for typeof(XYZ) index type variants etc. In fact, this syntax can be taken slightly further:

v2 = v1[Z,X,Y]; // instead of: v2 = v1|ZXY;
v2 = v1[Y,-X];  // v1 and v2 are now perpendicular. Not possible with current syntax.

Here, the structs X, Y, and Z would inherit from a common base that overloads the comma and negation operators. I like this, not only because it's more flexible, but also because it looks more natural.

I'd love to hear your thoughts on this!

Alex — 02 January 2010, 22:15

Sorry, for the very last indexing example: v1 and v2 are 2D vectors, unlike the previous examples. I realized I should've named them differently so as to avoid any confusion.

Emil — 08 January 2010, 13:02

Using op[] is better indeed, but can not be done because it must be a member. Otherwise, it would be the natural choice, same as in HLSL.

Gregory — 14 January 2010, 16:17

From ISO-IEC 14882-2003

"13.5.5 Subscripting

operator[] shall be a non-static member function with exactly one parameter."

Hence, v1[Z,X,Y] is not possible.

Alex — 14 January 2010, 21:21

@Emil Oh, that's right. Bummer. I guess the only way around that is by way of a macro that would need to be placed inside the vector class for op[] to become available. It's slightly more intrusive than the current way of registering a custom vector class, but I can't seem to think of any real downsides other than that. What do you think?

@Gregory It is possible as the comma operator can be overloaded for *structs* X/Y/Z. So (X,Y,Z) would return some callable type that is passed to the [] operator, much like XYZ is passed to the |operator in the current implementation.

Alex — 14 January 2010, 22:02

Hm, I realized that the macro is of no use if you're employing a vector class as defined by some external library. Being able to only use op[] for user but not library classes would probably be more confusing than the pipe syntax and its precedence issues.

Ultimately, the end the user is free to define their own op[] that simply forwards to op|, and this is what I'll most likely do for my own project.

Emil — 15 January 2010, 11:05

Let's use the Boost Developers mailing list for this discussion, others might want to participate.

http://lists.boost.org/mailman/listinfo.cgi/boost

Post a comment


CALL-151

09-14-2009

It is only natural for people to think with nostalgia about the good 'ol days: "oh man, back in the day things were so simple!" Objectively speaking, obviously, the "good 'ol days" sucked: that simplicity is only skin deep.

Yes, a telephone was much easier to use in the 70s, but then again it didn't do that much, did it? Could you imagine having to memorize the phone numbers of everyone you need to text or call? Right, that would have sucked. How about having to fast forward in order to get to the next song on a compact cassette? Ouch.

Yet, some things from the past are worth admiring. To me, a steam engine is prettier than the most advanced internal combustion engines of today. Not really practical, yet pretty.

But I'm a programmer, and a lot of the old stuff I admire is software. And while I have (if only once) programmed a computer with punchcards, those were already antiques when I was in school; the computer I grew up programming was the Apple ][.

Actually, not quite: I grew up in Bulgaria which was a member of COMECON, an economic organization of the communist states. According to Wikipedia, at its peak Bulgaria produced 40% of the computers in COMECON. Some of what we did was what China does today: copying and mass-producing western technology. So I grew up programming the Pravetz-82, which was an Apple ][ clone built with Bulgarian-made clone of the 6502 CPU. :)

So here we go: at the risk of wasting my time raving about something most programmers today can't appreciate, I'll try to explain but just a tiny bit of Wozniak's brilliancy:

The Apple ][ floppy disk controller bootstrapping routine

Some background: on physical magnetic media, you can't "just record data". Instead, 1 is encoded as a magnetic polarity transition (also known as flux reversal), and 0 is encoded as, well, lack of transition. Basically, for each bit there is a limited time window in which flux reversal is expected to occur. If it occurs, the bit is 1, otherwise it's 0. The problem is that the time window is so small that there is a limit on the accuracy with which the data window length can be measured. So, while detecting a series of ones is not a problem, it's difficult to know how many zeroes are encoded in a measured period of lack of flux reversal.

Wozniak's original hardware design imposed two constraints on the raw data:

  • between any two 1-bits, there can be no more than one 0-bit, and
  • each 8-bit byte must start with a 1-bit

However, using simple FM encoding "wastes" 4 bits per raw byte, which would allow only 10 256-byte sectors per track to be recorded. Instead, he devised a more complex encoding scheme that was based on the fact that there are 34 8-bit numbers which have the top bit set and no two 0-bits in a row. That way, 13 sectors per track could be recorded. Later on the hardware design of the floppy disk controller was modified to allow no more than one occurrence of two consecutive 0-bits in a byte, which lead to different encoding called 6&2; it allowed 16 sectors per track.

A conventional floppy disk controller design would put all this bit twiddling in silicone which just DMAs the decoded bytes in memory -- but it would cost more in hardware. So what is a Wozniak to do? Obviously: screw hardware, do everything in software.

That would have been impressive enough on a 6502 CPU, but even more impressive is what is required of the bootstrapping routine. It must:

  • generate the 6&2 decoding table in RAM
  • seek the disk drive to track zero
  • look for the beginning marker of sector zero and read the raw data in RAM
  • decode the data and jump into it

...all in no more than 256 bytes of code, which is the addressing limit for the ROM on Apple ][ expansion cards.

By the way, there are no timers on the Apple ][, all timing is based on the CPU clock, e.g. you'd know that an INX instruction takes 2 microseconds to execute. The 6502 has three 8-bit registers: an accumulator A which can add, subtract, shift, rotate, and, or, xor, etc., and two 8-bit index registers X and Y which can do none of those operations. :)

]CALL-151
*C600L

C600-   A2 20       LDX   #$20
C602-   A0 00       LDY   #$00
C604-   A2 03       LDX   #$03
C606-   86 3C       STX   $3C
C608-   8A          TXA
C609-   0A          ASL
C60A-   24 3C       BIT   $3C
C60C-   F0 10       BEQ   $C61E
C60E-   05 3C       ORA   $3C
C610-   49 FF       EOR   #$FF
C612-   29 7E       AND   #$7E
C614-   B0 08       BCS   $C61E
C616-   4A          LSR
C617-   D0 FB       BNE   $C614
C619-   98          TYA
C61A-   9D 56 03    STA   $0356,X
C61D-   C8          INY
C61E-   E8          INX
C61F-   10 E5       BPL   $C606
C621-   20 58 FF    JSR   $FF58
C624-   BA          TSX
C625-   BD 00 01    LDA   $0100,X
C628-   0A          ASL
C629-   0A          ASL
C62A-   0A          ASL
C62B-   0A          ASL
C62C-   85 2B       STA   $2B
C62E-   AA          TAX
C62F-   BD 8E C0    LDA   $C08E,X
C632-   BD 8C C0    LDA   $C08C,X
C635-   BD 8A C0    LDA   $C08A,X
C638-   BD 89 C0    LDA   $C089,X
C63B-   A0 50       LDY   #$50
C63D-   BD 80 C0    LDA   $C080,X
C640-   98          TYA
C641-   29 03       AND   #$03
C643-   0A          ASL
C644-   05 2B       ORA   $2B
C646-   AA          TAX
C647-   BD 81 C0    LDA   $C081,X
C64A-   A9 56       LDA   #$56
C64C-   20 A8 FC    JSR   $FCA8
C64F-   88          DEY
C650-   10 EB       BPL   $C63D
C652-   85 26       STA   $26
C654-   85 3D       STA   $3D
C656-   85 41       STA   $41
C658-   A9 08       LDA   #$08
C65A-   85 27       STA   $27
C65C-   18          CLC
C65D-   08          PHP
C65E-   BD 8C C0    LDA   $C08C,X
C661-   10 FB       BPL   $C65E
C663-   49 D5       EOR   #$D5
C665-   D0 F7       BNE   $C65E
C667-   BD 8C C0    LDA   $C08C,X
C66A-   10 FB       BPL   $C667
C66C-   C9 AA       CMP   #$AA
C66E-   D0 F3       BNE   $C663
C670-   EA          NOP
C671-   BD 8C C0    LDA   $C08C,X
C674-   10 FB       BPL   $C671
C676-   C9 96       CMP   #$96
C678-   F0 09       BEQ   $C683
C67A-   28          PLP
C67B-   90 DF       BCC   $C65C
C67D-   49 AD       EOR   #$AD
C67F-   F0 25       BEQ   $C6A6
C681-   D0 D9       BNE   $C65C
C683-   A0 03       LDY   #$03
C685-   85 40       STA   $40
C687-   BD 8C C0    LDA   $C08C,X
C68A-   10 FB       BPL   $C687
C68C-   2A          ROL
C68D-   85 3C       STA   $3C
C68F-   BD 8C C0    LDA   $C08C,X
C692-   10 FB       BPL   $C68F
C694-   25 3C       AND   $3C
C696-   88          DEY
C697-   D0 EC       BNE   $C685
C699-   28          PLP
C69A-   C5 3D       CMP   $3D
C69C-   D0 BE       BNE   $C65C
C69E-   A5 40       LDA   $40
C6A0-   C5 41       CMP   $41
C6A2-   D0 B8       BNE   $C65C
C6A4-   B0 B7       BCS   $C65D
C6A6-   A0 56       LDY   #$56
C6A8-   84 3C       STY   $3C
C6AA-   BC 8C C0    LDY   $C08C,X
C6AD-   10 FB       BPL   $C6AA
C6AF-   59 D6 02    EOR   $02D6,Y
C6B2-   A4 3C       LDY   $3C
C6B4-   88          DEY
C6B5-   99 00 03    STA   $0300,Y
C6B8-   D0 EE       BNE   $C6A8
C6BA-   84 3C       STY   $3C
C6BC-   BC 8C C0    LDY   $C08C,X
C6BF-   10 FB       BPL   $C6BC
C6C1-   59 D6 02    EOR   $02D6,Y
C6C4-   A4 3C       LDY   $3C
C6C6-   91 26       STA   ($26),Y
C6C8-   C8          INY
C6C9-   D0 EF       BNE   $C6BA
C6CB-   BC 8C C0    LDY   $C08C,X
C6CE-   10 FB       BPL   $C6CB
C6D0-   59 D6 02    EOR   $02D6,Y
C6D3-   D0 87       BNE   $C65C
C6D5-   A0 00       LDY   #$00
C6D7-   A2 56       LDX   #$56
C6D9-   CA          DEX
C6DA-   30 FB       BMI   $C6D7
C6DC-   B1 26       LDA   ($26),Y
C6DE-   5E 00 03    LSR   $0300,X
C6E1-   2A          ROL
C6E2-   5E 00 03    LSR   $0300,X
C6E5-   2A          ROL
C6E6-   91 26       STA   ($26),Y
C6E8-   C8          INY
C6E9-   D0 EE       BNE   $C6D9
C6EB-   E6 27       INC   $27
C6ED-   E6 3D       INC   $3D
C6EF-   A5 3D       LDA   $3D
C6F1-   CD 00 08    CMP   $0800
C6F4-   A6 2B       LDX   $2B
C6F6-   90 DB       BCC   $C6D3
C6F8-   4C 01 08    JMP   $0801
C6FB-   00          BRK
C6FC-   00          BRK
C6FD-   00          BRK
C6FE-   00          BRK
C6FF-   00          BRK

Christer Ericson — 15 September 2009, 11:33

The Apple II disk controller and associated hardware is probably the most impressive design I know.

You forgot to mention there's a second 256 byte "program" which is the controller state machine for shifting bits in (and out) of the memory mapped register(s) that the boot ROM addresses (the $C08C,X location, etc).

Changing both programs is what allowed the transition from DOS 3.2 to DOS 3.3 (13 to 16 sectors per track).

Vladimir Strinski — 15 September 2009, 14:40

To add on top of that - he not only coded everything within the 256 bytes, he had a few bytes at the end to spare!

AND on top of that - the first two bytes are a reserved value so the BIOS would recognize the controller, so you can't really use them.

Emil — 15 September 2009, 15:29

Here are some pictures I found of Bulgarian-made Apple ][ clones: the first two are of Pravetz-82 from the late 70s/early 80s, the third picture is of Pravetz-8C from the late 80s/early 90s. These Bulgarian computers were compatible with Apple ][, but their evolution didn't entirely follow the evolution of Apple ][, e.g. some models didn't have an exact original.

Post a comment


More on shared_ptr

09-07-2009

Last week, I found myself on a game development mailing list, arguing about using boost::shared_ptr. Such discussions go something like this:

GameDeveloper1: What about smart pointers? Emil: shared_ptr pwns!!!11~`~1 GameDeveloper2: It sucks, it allocates two memory blocks! Emil: Not if you use allocators. GameDeveloper2: Bleah, allocators suck, they're intrusive! Emil: Not the way shared_ptr uses them. GameDeveloper2: Oh. Well it sucks anyway, it takes the space of two raw pointers! And I've been developing games for 15 years, never needed a smart pointer! Ha!

In the last week's discussion someone asked me if I could show in concrete code how it's possible to have a shared_ptr factory which is not intrusive with regard to the type of the objects it creates, yet is able to place the shared_ptr control block in the same memory block as the object (indeed, non-intrusiveness is an important requirement because factories often operate behind a DLL boundary.)

Let's start with the factory interface. This is a complete header file, it doesn't need to #include any headers:

//int_factory.h:

namespace boost { template <class> class shared_ptr; }
boost::shared_ptr<int> make_shared_int( int value );
Why is it an int factory? Basically, to make the point that shared_ptr allows a factory to control the allocation strategy without being intrusive (on the other hand, this doesn't illustrate the other very important shared_ptr feature, that the type of the pointee can be incomplete or even void.)

I'll paste the full implementation of the make_shared_int function at the end of this post, here I will describe the 4 allocation strategies it chooses from:

  • The default allocation type just news an int and places it in a shared_ptr using its most default behavior.
  • Case 1 calls boost::make_shared, which places both the object and the shared_ptr control block in the same memory block. The benefit of this approach is that it reduces memory fragmentation. In practice, this strategy almost makes the shared_ptr control block disappear since it's rather small compared to the typically large managed object. The downside is that the memory occupied by the object cannot be reclaimed until the control block expires, which in general could outlive the object due to weak_ptr references.
  • Case 2 is the trickiest, almost all of the machinery in the make_shared_int implementation is needed to support it. This case separates all control blocks into a memory pool of static size, which could reduce fragmentation (or not, because it's possible that the built-in memory manager already implements small objects optimization). Another potential benefit is that it improves cache coherency in use cases heavy on copying shared_ptr objects, since all refcounts are placed close together.
  • Case 3 demonstrates another powerful shared_ptr feature, called aliasing. In this case, the factory actually creates an object of type widget, which is entirely unrelated to the type of the objects (int) the factory returns. How is this safe? Well, the returned shared_ptr points a member of widget, which is of type int. While the factory still returns shared_ptr<int>, in this case the widget destructor will be called when the object expires.

Here is the complete source code of int_factory.cpp:

#include "boost/make_shared.hpp"
#include <vector>
#include <stdlib.h>
#include <assert.h>

namespace
  {
  //This class implements a simple memory manager
  //which allocates memory in blocks of size S.
  //All blocks are placed sequentially in memory.
  template <std::size_t S>
  class
  array_storage
    {
    array_storage( array_storage const & );
    array_storage & operator=( array_storage const & );

    std::vector<char> storage_;
    std::vector<void *> free_;

    public:

    explicit
    array_storage( size_t max_capacity )
      {
      storage_.reserve(S*max_capacity);
      }

    ~array_storage()
      {
      assert(free_.size()*S==storage_.size());
      }

    void *
    allocate()
      {
      if( free_.empty() )
        {
        size_t ss=storage_.size();
        assert(ss+S<=storage_.capacity());
        storage_.resize(ss+S);
        return &storage_[ss];
        }
      else
        {
        void * p=free_.back();
        free_.pop_back();
        return p;
        }
      }

    void
    deallocate( void * p )
      {
      free_.push_back(p);
      }
    };

  //An allocator for control blocks. While this class template
  //doesn't satisfy all requirements for allocators, it defines
  //everything that boost::shared_ptr needs.
  template <class T>
  class
  cb_allocator
    {
    typedef array_storage<sizeof(T)> as_type;

    static
    as_type &
    memory()
      {
      static as_type m(256);
      return m;
      }

    public:

    template <class U>
    struct
    rebind
      {
      typedef cb_allocator<U> other;
      };

    cb_allocator()
      {
      }

    template <class U>
    cb_allocator( cb_allocator<U> const & )
      {
      }

    T *
    allocate( std::size_t n, void const * )
      {
      assert(n==1);
      T * cb=static_cast<T *>(memory().allocate());
      return cb;
      }

    void
    deallocate( T * p, std::size_t n )
      {
      assert(n==1);
      memory().deallocate(p);
      }
    };

  //This is a custom object deleter which simply calls delete.
  //It is needed because the user is required to also supply
  //a deleter when using an allocator.
  struct
  delete_deleter
    {
    template <class T>
    void
    operator()( T * p )
      {
      delete p;
      }
    };

  struct
  widget
    {
    int value_;

    explicit
    widget( int value ):
      value_(value)
      {
      }

    ~widget()
      {
      }
    };

  boost::shared_ptr<int>
  make_shared_widget( int value )
    {
    boost::shared_ptr<widget> x(new widget(value));
    return boost::shared_ptr<int>(x,&x->value_);
    }
  } //unnamed namespace

boost::shared_ptr<int>
make_shared_int( int value )
  {
  switch(rand()%4)
    {
    //Just a plain old shared_ptr using new, calls delete
    //to dispose.
    default:
      return boost::shared_ptr<int>(new int(value));

    //Combines the control block and the object in a single memory block.
    //The downside is that the single memory block is not freed until
    //the control block expires (in general, the control block may
    //outlive the object due to weak_ptr references.)
    case 1:
      return boost::make_shared<int>(value);

    //Allocates the object using new, calls delete to dispose.
    //All control blocks are placed in sequential memory, which
    //could reduce fragmentation.
    case 2:
      return boost::shared_ptr<int>(
               new int(value),
               delete_deleter(),
               cb_allocator<int>() );

    //Allocates an object of type widget, and returns a shared_ptr
    //that points to its value_. Correctly calls ~widget upon disposal.
    case 3:
      return make_shared_widget(value);
    }
  }

See also: shared_ptr Programming Techniques.

Christer Ericson — 07 September 2009, 16:21

Wow, 200 lines of code to avoid having to do one line of delete. No thanks.

Emil — 07 September 2009, 16:54

Not even that, turns out all 4 cases call delete. What a waste! :)

mbrodersen — 08 September 2009, 00:40

Yep I get it. This clearly demonstrates why it makes sense to write convoluted code to "improve" a simple solution (calling delete).

(For the sarcasm illiterate out there: the previous paragraph is rated "highly sarcastic" and might offend some readers)

Sylvain Glaize — 08 September 2009, 01:25

I would rather say it's misreading the code. The 200 lines of code are mainly recreating a small environment to demonstrate the usage of shared_ptr through 20 lines for four differents usages.

Considering the coding style of Emil (lots of n), it's a pretty small code.

Thanks Emil for all these explanations, here and on the ML.

Joacim Jacobsson — 10 September 2009, 05:51

http://www.youtube.com/watch?v=3vzcMdkvPPg

The executable rendering that "movie" is 256 bytes large. I wonder how large your code is when it's compiled.

Emil — 10 September 2009, 15:09

You want to talk about amazing 256-byte code? The top of my list has to be the 254-byte program in the Apple ][ floppy disk controller ROM, which reads sector 0 of track 0 of the disk and runs it. Naturally, very few people appreciate that because most have no idea how difficult the problem is to solve, even without the size restriction.

For this post, it would be more relevant to consider what this code would add to the size of the executable of a typical game. More relevant still would be to also consider the upside: being able to selectively optimize the memory allocation for any object instance by only changing a single 200-line cpp file, without a need to re-compile any other file.

Joseph Garvin — 05 October 2009, 13:41

Wow, talk about being blinded by zealotry.

Compiled this will be 0 bytes, because it's a template! And once it's specialized, it's practically all going to get thrown away and inlined at compile time anyway.

And you don't even have to write 200 lines! In boost 1.39+ the make_shared function will do this for you.

And using a smart pointer doesn't just save you one call to delete! Once you make your 'one call' to delete exception safe, you're either going to have made an equally 'burdensome' RAII scope guard or you're going to have a dozen deletes.

Oh, and you can check out unintrusive_ptr at toast.sourceforge.net. It uses just one raw pointer, but doesn't support multiple inheritance.

Andy P — 02 November 2009, 08:12

> Wow, 200 lines of code to avoid having to do one line of delete. No thanks.

> Yep I get it. This clearly demonstrates why it makes sense to write convoluted code to "improve" a simple solution (calling delete).

Yep, because no-one writing C++ has ever either failed to call delete (leaking memory) or called it twice (causing a crash) or deleted something in one place while it's still being pointed at from another (causing, uh... well, anything from a crash to a memory scribble to "apparently" normal behaviour until something goes wrong in a completely unrelated area of code). Lucky delete is simple, hey!

(PS. if you claim it is simple and you've never done any of the above, or that no competent programmer would do any of the above, you're a liar).

Post a comment


Error Handling, the Maya SDK Way

08-21-2009

I didn't really plan to become this focused on error handling. Who would want that? Dealing with errors is frustrating, difficult to test, and takes your attention away from all the action! But I have to admit, whatever the reason is, I do care about errors a lot; perhaps too much (though this doesn't mean that I find it any less frustrating than anyone else.)

In C, there is an obvious difficulty in reporting errors. The typical approach is to return an error code, but the problem is that this takes away the return value of the function, so any function that could fail can't return a result in case of success. You may think that this complicates the use of the API but the reality is that an API that reports errors by returning error codes is already difficult to use. At some point, you realize that writing

int err=foo(....);
if( err )
    {
    free-my-own-resources;
    return err;
    }

is just silly. And then if you're not stuck in C, you discover that destructors in C++ take care of freeing resources automatically, and that exception handling takes care of the "if(err) return err;" you have to sprinkle all over the place in a C program. So you're left with the straight forward

foo(....);

That said, I do like plain old C APIs. One of the reasons why I don't mind their code-based error reporting is that wrapping a C function is the easiest thing to do.

Making a C function throw on error

Here is how a C function -- even one that creates an object that needs to be deleted later -- can be wrapped to throw on error:

#include "boost/exception/errinfo_file_name.hpp"
#include "boost/exception/errinfo_file_open_mode.hpp"
#include "boost/exception/errinfo_errno.hpp"
#include "boost/exception/info.hpp"
#include "boost/throw_exception.hpp"
#include "boost/shared_ptr.hpp"
#include <stdio.h>

struct error: virtual std::exception, virtual boost::exception { };
struct file_open_error: virtual error { };

boost::shared_ptr<FILE>
my_fopen( char const * name, char const * mode )
    {
    if( FILE * f=fopen(name,mode) )
        return boost::shared_ptr<FILE>(f,fclose);
    else
        BOOST_THROW_EXCEPTION(
            file_open_error() <<
            boost::errinfo_file_name(name) <<
            boost::errinfo_file_mode(mode) <<
            boost::errinfo_errno(errno) );
    }

While this might seem a lot of code, it really isn't: all of the included Boost headers are very lean. Also, consider what the header file for my_fopen looks like:

namespace boost { template <class> class shared_ptr; }
#include <stdio.h>
boost::shared_ptr<FILE> my_fopen( char const *, char const * );

It might be argued that wrapping every C function that could fail in this manner is rather tedious. I'd admit that this is a significant downside, but consider the upside: in any exception-neutral context (anywhere you'd have to sprinkle the "if(err) return err;"), you don't have to worry about errors! And the thing is, most contexts in a program are exception-neutral indeed.

But if the reader remembers, the topic of this post was

Error Handling, the Maya SDK Way

So how do they deal with errors in Maya? Here is an example of the member funciton MDagPath::hasFn:

bool hasFn( MFn::Type type, MStatus * ReturnStatus=NULL ) const;

That's just awesome, I get to decide if I want the API to tell me if the function I call succeeded. :) I don't want to sound cocky but seriously, I'm wondering if they even realize how big of a deal this is -- they openly invite everyone to ignore errors! They went the extra mile to make it easier for the programmers to be sloppy!

Should I care that much about other people writing sloppy code? After all, I could wrap the Maya API the usual way and convert failures to exceptions, right?

Two problems with that:

First, a big and complicated program like Maya has its own character, its own culture if you will. Even if my code deals with errors correctly, all the other code in the system is so much more that it makes my efforts to do the right thing silly. The net effect of the Maya error handling strategy is that failures are dealt with only to the point of not causing a crash. How does the user know if an operation was successful? The best way to figure this out that I've found is 1) look if it seems to have done what it was supposed to do; 2) if not, look in the Maya console for an error message; 3) if all fails, ask a more experienced Maya user.

The second problem is actually very annoying. Turns out they also do the default MStatus pointer trick in constructors, including in constructors of non-copyable types. Here's one of the MFnCamera constructors:

MFnCamera( MObject & object, MStatus * ReturnStatus=NULL );

(Btw, if you're a fan of NULL, you should read this.)

The thing is, while wrapping a regular (member) function and making it throw on error is trivial, wrapping such a constructor is impossible: you really have to wrap the entire type, by deriving from MFnCamera.

And then you proceed to write that derived type's constructor. Naturally, the wrapper's constructor doesn't take an MStatus argument, because it throws on error. But what can it pass as a ReturnStatus pointer to the original constructor?

MyFnCamera::
MyFnCamera( MObject & object ):
    MFnCamera(object,?????)
    {
    }

It could pass a pointer to a global object, except that Maya is a multi-threaded application. It definitely can't pass the address of a MyFnCamera member, because those are initialized after the base object. So the only way is to make MyFnCamera derive from MStatus, in addition to MFnCamera.

It would have been so much simpler if Maya was a plain old C API, wouldn't it!

Reg — 23 August 2009, 06:03

Yes it would, but it would be even more simple if every C++ code used exception handling to handle errors. That's what they are designed for! One of the reasons why I dislike C++ is that there is no common convention and thus almost no libraries or API-s make use of full C++ including std::strings, containers, streams, exceptions etc. C# or Java programmers don't have such problems :/

Emil — 23 August 2009, 23:04

Right, it is ridiculous that C++ programmers can't even agree on a string type nevermind more advanced features.

Still, I don't have much respect for Java and C#. I mean, how can you respect a language that copied the C++ syntax, of all things. :)

Denis — 27 August 2009, 11:14

Well, considering the size of Maya codebase, I'd expect exceptoion handling everywhere leading to serious code bloat (the call stack often has so many unknown function calls leading to the dungeons of internal Maya engines). I'm just using 2 macros anytime I'm calling anything from Maya API (which call assert and show a lot of info in the output whenever any returned status code is not successful). I am not a very elegant guy, so no headache regarding "underused power of C++ and macro hell" 8)

Emil — 27 August 2009, 13:28

You're using Maya yet are concerned about code bloat? :) Seriously though, exception handling doesn't add much, it certainly has a lot less influence on code size compared to, say, good design.

The purpose of assert() is to indicate undefined behavior (crash) as result of violated preconditions. It isn't always reasonable for Maya to crash when some function call was not successful.

Denis — 29 August 2009, 03:33

Alright, in my case it is somewhat different assert (let's not go into philosophical disputes regarding the usags of asserts). This is a standard windows assert and I had to make similar thing on Linux. So instead of crashing, it just shows me which action and where returned an error code and leaves current function. And on windows I can also attach with debugger when I am running in debug mode to see the "context".

Post a comment


Parallel Programming Seminar

08-12-2009

I haven't posted much on my blog lately, one of the reasons is that I was busy traveling. Besides an interesting trip to Philadelphia for a meeting with Comcast, I presented a 3-day seminar on parallel programming using CUDA at the Interdisciplinary Mathematics Institute, University of South Carolina.

The materials from the seminar are now available for browsing and downloading here.

Post a comment


boost::error_info Exception Safety

06-07-2009

At BoostCon, after my Boost Exception talk someone raised the question what would happen if adding error_info to a boost::exception throws an exception. The same issue was also recently discussed on the Boost mailing list.

The issue

A bit of information for readers not familiar with Boost Exception. This is a Boost Library which provides a base exception type boost::exception which defines a type-safe interface for adding and retrieving any information in exception objects that derive from it. This can be done directly in the throw expression:

struct file_open_error: virtual std::exception, virtual boost::exception { };
typedef boost::error_info<struct tag_file_name,std::string> file_name;

....
throw file_open_error() << file_name("foo.txt"); //(1)

Error info can also be added at a later time to an active exception object (consider that in some contexts the file name is not available at the time of the throw) :

catch( boost::exception & e )
    {
    e << file_name("foo.txt"); //(2)
    throw;
    }

The issue at hand is that if you get a (possibly std::bad_alloc) exception at (1), this new exception would propagate instead of file_open_error. At (2), the intention is to re-throw the original exception object, but another exception could be emitted by the e << file_name("foo.txt") expression.

People concerned with this issue propose changing the specification of boost::error_info and operator<< to guarantee nothrow behavior, by ignoring the possible failure. The rationale for this request is that perhaps it is healthier for the program to catch the exception it expects -- even without the additional information -- instead of some other (unrelated?) exception.

Critical error

There is a simple argument to be made why the current Boost Exception behavior is correct: dealing with failures is difficult. Testing code that deals with failures is even more difficult. Right? What if the system designed to deal with errors fails? Isn't this a critical situation? Isn't this more important than any other error?

If your hair is on fire, should you really worry about something like a file_open_error? :)

Dexter from "Dexter's Laboratory" running in circles around his sister Dee Dee, screaming "MY HAIR IS ON FIRE MY HAIR IS ON FIRE!!"

What if operator<< is nothrow?

There is some logic to saying that if a particular error_info is missing, the application could reasonably deal with the problem. However, we're asking the application to do a lot more.

Ignoring failures to add error_info to exceptions means that the application must be able to deal with any exception based on its type only. We'd also be expecting the application to deal with different permutations of availability of any error_info in exceptions it catches. That's because most containers -- including the one used internally by boost::exception -- provide only basic exception safety guarantee, which in general means that failing to add error_info to an exception could leave the container in any state whatsoever; all we know is that we won't get a resource leak.

It's possible to use a container with strong exception safety guarantee, combined with ignoring failures to add error_info. This would mean that any error_info already successfully added to the exception would not disappear if an attempt to add another error_info at a later time fails. We could at least add error_info in order of importance.

But this doesn't improve our worst case scenario: the program would still be required to deal with any exception only knowing its type.

Secondly, it is too simplistic to only consider the possibility of running out of memory when adding error_info to exception objects: depending on the complexity of the throw expression, we could get other exceptions as well. Consider this snippet:

file_name get_file_name();

....
throw file_open_error() << get_file_name();

The system could run out of memory in the get_file_name function. In this case (obviously?) bad_alloc should propagate and not file_open_error.

Anyway, it wouldn't work!

Ultimately, even if we ignore all of the issues above, if our motivation was to guarantee that a throw expression, even something as simple as:

throw file_open_error();

will necessarily result in a file_open_error being propagated, we'd be out of luck.

That's because throwing an exception needs memory for the exception object. The C++ standard does not specify how this memory is to be allocated. The only requirement for the implementation is to have enough spare memory to be able to throw a std::bad_alloc, but it is possible to run out of memory when attempting to throw another exception. In that case, the compiler is allowed to throw std::bad_alloc instead.

weak egg — 29 June 2009, 02:28

If you really want to report an out of memory error you should alloc some memory beforehand and when it cames to the to the OOM handling you must free the memory and report the error as last effort.

But OOM reporting is the task for the OS if you think about it.

Emil — 01 July 2009, 11:18

Well in this case the issue is not reporting OOM condition, it's ending up throwing std::bad_alloc when the programmer intended to throw something else. Otherwise yes, the OS would report OOM to the C++ runtime, which in turn would throw std::bad_alloc, for which it is required to have enough available pre-allocated memory, as you're suggesting.

zilion — 21 September 2009, 09:18

Hello

I tried a little bit with boost::exception and tried to nest exceptions: http://www.copypastecode.com/11142/comment-page-1/#comment-17659

But why doesn't this output "Something went wrong"? Thanks for your time.

Emil — 21 September 2009, 13:19

Please use <boost-users@lists.boost.org> for support questions on Boost.

Post a comment


View all posts