Archive for the ‘C’ Category

Local variables are free

Saturday, December 19th, 2009

This is part II of my irregularly scheduled series on compiler optimization. In part I, I explained how the compiler can optimize away return statements, resulting in missed breakpoints. My given workaround to that problem, though effective, was very ugly and architecture-dependent, much like Cowboys Stadium.

(gdb) break *0x00001fc5 if $eax != 0

Although there’s not much we can do to prevent the compiler optimization, we can greatly simplify our conditional breakpoint. I had suggested rewriting the source code, which was awe-inspiringly prescient, because that’s what I’m going to do now. Here’s the original code:

8	if (ShouldReturn())
9		return;

And here’s the revised code:

8	int localVar = ShouldReturn();
9	if (localVar)
10		return;

The return at line 10 will still be optimized away. However, the revised code allows us to set a simple breakpoint at line 9 that will stop when we want:

(gdb) break 9 if localVar != 0

No knowledge of the architecture, machine registers, or assembly language is required.

From the beginning of time (January 1970, of course), programmers have struggled over coding style. Objective-C programmers, for example, expend undue effort arranging their brackets. (I have [NSMutableArray array] going to the Final Four.) For some, bracket-making becomes a kind of game or contest.

[[[[[[[[[[[[[See how] many] method] calls] we] can] fit] on] one] line] of] source] code];

I’ve changed my coding style over the years, but I’ve settled on one fundamental principle: write your code so that it’s easy to debug. All your fancy margin-aligning isn’t going to help when you need to figure out why your app keeps exploding. If you have nested method calls on one line of code, you can’t easily set a breakpoint in the middle. That’s why I prefer as much as possible to have only one method call per line of code, and create a local variable to store the return value.

There is a misconception that local variables are expensive, in terms of either computation or memory. The truth is that local variables are very cheap, the value meals of the computing world. (Would you like trans fat with your saturated fat?) It only takes one machine instruction to store a pointer address to a local variable. One machine instruction is really quite fast, about as fast as you can get — at least with restrictor plates. With regard to memory, local variables only take up stack space. To create a local variable, you simply move the stack a little. When the method or function returns, the stack is moved back, and thereby the space reserved for local variables is automatically recovered. Of course, you don’t want to create large C arrays on the stack, but a pointer to an Objective-C object only takes 4 bytes on the stack for 32-bit, 8 bytes for 64-bit. The default 32-bit stack size is 8MB, so you’re not going to run out of space unless you have deeply recursive calls.

Even these small costs are only relevant in the context of your app’s unoptimized, debug configuration. For your customers, on the other hand, local variables are free. As in Mumia, or Bird. When you compile your app using the release configuration, the local variables disappear, the compiler optimizes them away. (By the way, this is one of the reasons that debugging the release build of your app can be a frustrating and/or wacky experience.) To see the optimization in action, let’s consider some sample code:

1  #import <Foundation/Foundation.h>
2
3  @interface MyObject : NSObject {}
4  @end
5
6  @implementation MyObject
7
8  -(NSString *)myDirectProcessName {
9  	return [[[NSProcessInfo processInfo] processName] lowercaseString];
10 }
11
12 -(NSString *)myRoundaboutProcessName {
13 	NSString *myRoundaboutProcessName = nil;
14 	NSProcessInfo *processInfo = [NSProcessInfo processInfo];
15 	NSString *processName = [processInfo processName];
16 	NSString *lowercaseString = [processName lowercaseString];
17 	myRoundaboutProcessName = lowercaseString;
18 	return myRoundaboutProcessName;
19 }
20
21 @end
22
23 int main(int argc, const char *argv[]) {
24 	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
25 	MyObject *myObject = [[[MyObject alloc] init] autorelease];
26 	NSLog(@"My direct process name: %@", [myObject myDirectProcessName]);
27 	NSLog(@"My roundabout process name: %@", [myObject myRoundaboutProcessName]);
28 	[pool release];
29 	return 0;
30 }

The above code is obviously contrived and useless. It only has value for explanatory purposes, and perhaps in the app store for $0.99. The methods -myRoundaboutProcessName and -myDirectProcessName do the same thing, the former with and the latter without local variables. Here’s the i386 disassembly for the methods when compiled using the debug configuration:

-[MyObject myDirectProcessName]:
00001d2a	nop
00001d2b	nop
00001d2c	nop
00001d2d	nop
00001d2e	nop
00001d2f	nop
00001d30	pushl	%ebp
00001d31	movl	%esp,%ebp
00001d33	pushl	%ebx
00001d34	subl	$0x14,%esp
00001d37	calll	0x00001d3c
00001d3c	popl	%ebx
00001d3d	leal	0x000012e8(%ebx),%eax
00001d43	movl	(%eax),%eax
00001d45	movl	%eax,%edx
00001d47	leal	0x000012e4(%ebx),%eax
00001d4d	movl	(%eax),%eax
00001d4f	movl	%eax,0x04(%esp)
00001d53	movl	%edx,(%esp)
00001d56	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d5b	movl	%eax,%edx
00001d5d	leal	0x000012e0(%ebx),%eax
00001d63	movl	(%eax),%eax
00001d65	movl	%eax,0x04(%esp)
00001d69	movl	%edx,(%esp)
00001d6c	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d71	movl	%eax,%edx
00001d73	leal	0x000012dc(%ebx),%eax
00001d79	movl	(%eax),%eax
00001d7b	movl	%eax,0x04(%esp)
00001d7f	movl	%edx,(%esp)
00001d82	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d87	addl	$0x14,%esp
00001d8a	popl	%ebx
00001d8b	leave
00001d8c	ret
-[MyObject myRoundaboutProcessName]:
00001d8d	nop
00001d8e	nop
00001d8f	nop
00001d90	nop
00001d91	nop
00001d92	nop
00001d93	pushl	%ebp
00001d94	movl	%esp,%ebp
00001d96	pushl	%ebx
00001d97	subl	$0x24,%esp
00001d9a	calll	0x00001d9f
00001d9f	popl	%ebx
00001da0	movl	$0x00000000,0xe8(%ebp)
00001da7	leal	0x00001285(%ebx),%eax
00001dad	movl	(%eax),%eax
00001daf	movl	%eax,%edx
00001db1	leal	0x00001281(%ebx),%eax
00001db7	movl	(%eax),%eax
00001db9	movl	%eax,0x04(%esp)
00001dbd	movl	%edx,(%esp)
00001dc0	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dc5	movl	%eax,0xec(%ebp)
00001dc8	movl	0xec(%ebp),%edx
00001dcb	leal	0x0000127d(%ebx),%eax
00001dd1	movl	(%eax),%eax
00001dd3	movl	%eax,0x04(%esp)
00001dd7	movl	%edx,(%esp)
00001dda	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001ddf	movl	%eax,0xf0(%ebp)
00001de2	movl	0xf0(%ebp),%edx
00001de5	leal	0x00001279(%ebx),%eax
00001deb	movl	(%eax),%eax
00001ded	movl	%eax,0x04(%esp)
00001df1	movl	%edx,(%esp)
00001df4	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001df9	movl	%eax,0xf4(%ebp)
00001dfc	movl	0xf4(%ebp),%eax
00001dff	movl	%eax,0xe8(%ebp)
00001e02	movl	0xe8(%ebp),%eax
00001e05	addl	$0x24,%esp
00001e08	popl	%ebx
00001e09	leave
00001e0a	ret

As expected, -myRoundaboutProcessName makes more room on the stack than -myDirectProcessName:

00001d34	subl	$0x14,%esp
00001d97	subl	$0x24,%esp

At 00001da0, -myRoundaboutProcessName sets the value of the local variable to nil, as in line 13 of the source code. The interesting differences, though, are immediately after the calls to objc_msgSend(). By the standard ABI, the register eax contains the return value of objc_msgSend(). In -myDirectProcessName, the value in eax is simply moved to the register edx:

00001d5b	movl	%eax,%edx

In contrast, -myRoundaboutProcessName first stores the value on the stack before moving it to edx. The address on the stack is the space reserved for the local variable:

00001dc5	movl	%eax,0xec(%ebp)
00001dc8	movl	0xec(%ebp),%edx

After the final objc_msgSend() call, -myDirectProcessName doesn’t bother to do much, because the return value in eax will become the return value of the whole method. In -myRoundaboutProcessName, it needs to store values in local variables as in lines 16 and 17 of the source code:

00001df9	movl	%eax,0xf4(%ebp)
00001dfc	movl	0xf4(%ebp),%eax
00001dff	movl	%eax,0xe8(%ebp)
00001e02	movl	0xe8(%ebp),%eax

So that’s how the methods differ in the unoptimized build. Now let’s see what happens when we use the release configuration. Here’s the optimized disassembly for -myDirectProcessName:

-[MyObject myDirectProcessName]:
00001dce	pushl	%ebp
00001dcf	movl	%esp,%ebp
00001dd1	subl	$0x18,%esp
00001dd4	movl	0x00003000,%eax
00001dd9	movl	%eax,0x04(%esp)
00001ddd	movl	0x0000302c,%eax
00001de2	movl	%eax,(%esp)
00001de5	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dea	movl	0x00003004,%edx
00001df0	movl	%edx,0x04(%esp)
00001df4	movl	%eax,(%esp)
00001df7	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dfc	movl	0x00003008,%edx
00001e02	movl	%edx,0x0c(%ebp)
00001e05	movl	%eax,0x08(%ebp)
00001e08	leave
00001e09	jmpl	0x0000400a	; symbol stub for: _objc_msgSend

The optimized method is significantly shorter, as expected from the compiler option -Os. First, you’ll notice that all those pesky nop instructions have been deleted. Stallman put them in unoptimized builds just to annoy us. (Or they may have been for Fix and Continue, but I always assume the worst.) There are additional optimizations as well that I won’t belabor here, because I’m eager to get to the climax. (Sorry, dear.) For your enlightenment and enjoyment, here’s the optimized disassembly for -myRoundaboutProcessName:

-[MyObject myRoundaboutProcessName]:
00001e0e	pushl	%ebp
00001e0f	movl	%esp,%ebp
00001e11	subl	$0x18,%esp
00001e14	movl	0x00003000,%eax
00001e19	movl	%eax,0x04(%esp)
00001e1d	movl	0x0000302c,%eax
00001e22	movl	%eax,(%esp)
00001e25	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001e2a	movl	0x00003004,%edx
00001e30	movl	%edx,0x04(%esp)
00001e34	movl	%eax,(%esp)
00001e37	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001e3c	movl	0x00003008,%edx
00001e42	movl	%edx,0x0c(%ebp)
00001e45	movl	%eax,0x08(%ebp)
00001e48	leave
00001e49	jmpl	0x0000400a	; symbol stub for: _objc_msgSend

Identical! Ah, that’s nice. Smoke ‘em if you got ‘em.

In conclusion, feel free to sprinkle, pepper, dash, or even drown your code with local variables. And with the engineering hours of debugging time you save, get me a nice (not free) present. I’m partial to flavored coffee and unflavored MacBooks.

Why did my breakpoint not get hit?

Monday, November 16th, 2009

This is part I of a II+ (take that, trademark trolls) part series on compiler optimization. For the gcc compiler, you can specify the level of optimization with various -O options. The default for compiling is -O0, which means do not optimize. As we shall see, however, the compiler always optimizes to an extent. That is to say, gcc -O0, you lie!

The primary reason for using the -O0 option (besides to avoid compiler optimization bugs) is to facilitate debugging of your code. With higher levels of optimization, the compiler is given more freedom to ‘ignore’ your source code in writing machine instructions, as long as the results are the same. Although it is possible to debug optimized binaries, the experience is often confusing and unhelpful for the programmer (much like reading cocoa-dev). Turning off optimization gives the closest correlation between source code and machines instructions. Yet even with no optimization, the correlation is not perfect, and this can lead to debugging problems.

Let’s consider a simple example:

$ cat > returnbreak.c
#include <stdio.h>

int ShouldReturn(void) {
	return 1;
}

void HelloWorld(void) {
	if (ShouldReturn())
		return;

	printf("Hello, World!\n");
}

int main(int argc, const char *argv[]) {
	HelloWorld();
	return 0;
}
$ gcc -g -O0 -o returnbreak returnbreak.c
$ gdb returnbreak
GNU gdb 6.3.50-20050815 (Apple version gdb-966) (Tue Mar 10 02:43:13 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done

(gdb) list HelloWorld
2
3	int ShouldReturn(void) {
4		return 1;
5	}
6
7	void HelloWorld(void) {
8		if (ShouldReturn())
9			return;
10
11		printf("Hello, World!\n");
(gdb) break 9
Breakpoint 1 at 0x1fc9: file returnbreak.c, line 9.
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak
Reading symbols for shared libraries ++. done

Program exited normally.

WTF?!? Why did my breakpoint not get hit?

(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc9 in HelloWorld at returnbreak.c:9

Hmm, that seems ok. Let’s try something else.

(gdb) break HelloWorld
Breakpoint 2 at 0x1fc0: file returnbreak.c, line 8.
(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc9 in HelloWorld at returnbreak.c:9
2   breakpoint     keep y   0x00001fc0 in HelloWorld at returnbreak.c:8
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak 

Breakpoint 2, HelloWorld () at returnbreak.c:8
8		if (ShouldReturn())
(gdb) c
Continuing.

Program exited normally.

Odd, it hits the breakpoint at line 8 but not at line 9. The breakpoint on line 9 is at address 0x00001fc9, so let’s look at the (i386) disassembly for that:

(gdb) disassemble 0x00001fc9
Dump of assembler code for function HelloWorld:
0x00001fb3 <HelloWorld+0>:	push   %ebp
0x00001fb4 <HelloWorld+1>:	mov    %esp,%ebp
0x00001fb6 <HelloWorld+3>:	push   %ebx
0x00001fb7 <HelloWorld+4>:	sub    $0x14,%esp
0x00001fba <HelloWorld+7>:	call   0x1fbf <HelloWorld+12>
0x00001fbf <HelloWorld+12>:	pop    %ebx
0x00001fc0 <HelloWorld+13>:	call   0x1fa6 <ShouldReturn>
0x00001fc5 <HelloWorld+18>:	test   %eax,%eax
0x00001fc7 <HelloWorld+20>:	jne    0x1fd7 <HelloWorld+36>
0x00001fc9 <HelloWorld+22>:	lea    0x30(%ebx),%eax
0x00001fcf <HelloWorld+28>:	mov    %eax,(%esp)
0x00001fd2 <HelloWorld+31>:	call   0x3005 <dyld_stub_puts>
0x00001fd7 <HelloWorld+36>:	add    $0x14,%esp
0x00001fda <HelloWorld+39>:	pop    %ebx
0x00001fdb <HelloWorld+40>:	leave
0x00001fdc <HelloWorld+41>:	ret
End of assembler dump.

When ShouldReturn() returns, the return value is in the register eax. The test instruction at 0x00001fc5 performs a bitwise AND of the two operands — which in this case are the same. If the result is non-zero — and in this case the result is 1 — the Zero Flag in the EFLAGS register is set to 0. This instruction corresponds to evaluating the conditional on line 8 of our source code. Then the jne instruction at 0x00001fc7 jumps to a certain address if the Zero Flag is 0. In our source code, the flow of control should move to the return statement on line 9 when the conditional evaluates to non-zero. According to the machine instructions, on the other hand, it jumps to 0x1fd7 when the conditional evaluates to non-zero. This address is the beginning of the standard function epilog, which restores the stack and registers to their previous state before returning.

The problem here is that while the function HelloWorld() has two exit points in our source code, it only has one exit point in the machine instructions. In essence, the compiler has optimized for size, despite our use of the -O0 option. Given the generated machine instructions, there is nowhere to put a breakpoint that will only be hit when the conditional at line 8 evaluates to non-zero. A breakpoint at 0x00001fc5 or 0x00001fc7 would be hit whenever the conditional is evaluated, which is always. A breakpoint at 0x00001fd7 would be hit whenever the function returns, which is always as well. Unfortunately, gdb places the breakpoint at 0x00001fc9, which is actually the opposite of what we intended, because it only gets hit when the conditional evaluates to zero. This is why the program exits normally without ever hitting the breakpoint. I consider this to be a bug in gdb; it would be better, I think, if it would just fail and give an error when we try to set the breakpoint. Of course, it may be a bug in gcc that it optimizes away our multiple exit points with optimization off. But hey, what do you expect from free software?

There are several workarounds for this problem. One would be to re-write your source code. (No, that’s not a joke. See Part II of this series.) Another workaround, if you only want to break on the result of a conditional, is to use a conditional breakpoint:

(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break *0x00001fc5 if $eax != 0
Breakpoint 1 at 0x1fc5: file returnbreak.c, line 8.
(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc5 in HelloWorld at returnbreak.c:8
	stop only if $eax != 0
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak 

Breakpoint 1, 0x00001fc5 in HelloWorld () at returnbreak.c:8
8		if (ShouldReturn())
(gdb) c
Continuing.

Program exited normally.

To summarize, if you find that your breakpoints are not getting hit, you now know who to blame. Namely, yourself. It’s almost certain that your Xcode project settings are wrong.

It’s over

Thursday, August 6th, 2009

I figured I’d cruise, at least through the Spring. However, the wheels on the bus go round and round.

rdar://problem/7125338

I am still master of my domain. Although I need to renew before it expires in three weeks.

Compiler indirectives and metaphorical keypaths

Wednesday, July 2nd, 2008

I am, like, literally ROTFRTFMIMHOLOLYMMVIIRCFUBAROTOH!

[[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(likeNoWay:) name:@"ROTFRTFMIMHOLOLYMMVIIRCFUBAROTOH" object:nil];

A string literal is a sequence of characters enclosed in quote-unquote ‘\”quotation marks\”‘ (wiggles index and middle fingers). In the C programming language — so-named because its inventors lacked imagination — a string literal represents an array of char terminated by a null (or by a comma when the feeling’s not that strong). In the Objective-C programming language — otherwise known as The O.C. — a string literal in a compiler directive (e.g., @"NSBirdJustFlewIntoMyWindowException") defines a constant NSString object.

Chris Hanson and Sanjay Samani have offered some excellent advice about avoiding the use of these NSString literals in your method calls. The problem is that the compiler will accept pretty much any directive: with Objective-C 1.0 the compiler only warns about non-ASCII characters, and with Objective-C 2.0 it doesn’t even do that (for better or worse, but that’s a subject for a different post). Thus, if you happen to misspell a notification name, you’ll never know until your app misbehaves at runtime. I’m sorry, Dave, I’m afraid I can’t do that.

I recommend replacing NSString literals with macros or constant variables (huh?) wherever spelling matters. (Spelling matters everywhere. I’ve seen some pretty bad method names.) Here’s a little trick for handling arbitrary keypaths:

#define KEY1 @"key1"
#define KEY2 @"key2"
#define KEY3 @"key3"
#define DOT @"."

[self valueForKeyPath:KEY1 DOT KEY2 DOT KEY3];

Unfortunately, we’re still at the mercy of misspellings in nib bindings. Yet another reason to do without nibs. But that’s also a subject for a different post and horse of a different color (dried poop).

Logging in Leopard

Sunday, January 6th, 2008

The release of Leopard has given third-party developers a lot to do: attempting to restore features lost from Tiger, for instance. (By the way, where is the second party, and why am I never invited?) My friend Rainer Brockerhoff has provided a way, or Quay, to display hierarchical popup menus in the Dock again. One of my most missed features in Leopard is using NSLog to spew output exclusively to Xcode’s console log. When you debug or run your app in Xcode on Tiger, you can put NSLog calls everywhere without worrying about polluting console.log. In my opinion, console.log is only for important messages and errors. I frequently ask users to consult it if they’re experiencing a problem with an app. Either that or the Oracle at Delphi.

Leopard dispenses completely with console.log, though there is a “Console Messages” database query in Console. Whereas on Tiger stdout and stderr standardly go to console.log, on Leopard they boldly go to system.log (as well as to the “Console Messages” query). On either version of Mac OS X, Xcode redirects stdout and stderr to its own console log, so they don’t appear in Console at all.

According to the documentation, NSLog sends a message to stderr. This is true for Tiger, and it’s also true for Leopard, but Leopard’s NSLog has the additional behavior of sending a message to system.log regardless of whether stderr is redirected. Thus, when you debug or run your app in Xcode (these may amount to the same thing in Xcode 3), messages from NSLog appear both in Xcode’s console log and in system.log! Curiously, there is no duplication of NSLog messages in system.log when stderr is not redirected.

If you prefer to keep your debug output out of system.log, the workaround for this new NSLog behavior is to abandon NSLog for debugging purposes on Leopard. :-( After much experimentation with asl, I realized that our old faithful printf would work. Since printf writes to stdout, its output is redirected by Xcode. Plus, when you’re debugging your app in Xcode you don’t really need NSLog to tell you the name of your app, the date, or your shoe size.

A limitation of printf is that it doesn’t handle the format specifier %@ for an Objective-C object. With Cocoa, therefore, we want an Objective-C wrapper around printf (like, um, NSLog). If you add the following code to your target’s .pch file, you’ll have an Objective-C debug logging function JJLog available throughout your target’s code. To enable logging in your app’s debug build, just add JJLOGGING to the GCC_PREPROCESSOR_DEFINITIONS setting (AKA “Preprocessor Macros”) in the debug build configuration.


#ifdef __OBJC__
	#import <Cocoa/Cocoa.h>
	#ifdef JJLOGGING
		#define JJLog(...) (void)printf("%s:%i %s: %s\n", __FILE__, __LINE__, __PRETTY_FUNCTION__, [[NSString stringWithFormat:__VA_ARGS__] UTF8String])
	#else
		#define JJLog(...)
	#endif
#endif

In your app’s release build, the debug function is a NOP that the compiler will almost certainly optimize out. This conditional code should not cause problems when using GCC_PRECOMPILE_PREFIX_HEADER, because Xcode already generates a separate precompiled prefix header for each build configuration. See the .pch.gch.hash-criteria files in /Library/Caches/com.apple.Xcode.###/SharedPrecompiledHeaders.

You can send gobs of gab to JJLog without repercussion or remorse. However, you’ll still want to use NSLog (sparingly, please) for runtime errors in your release build. Now to continue in the spirit of this post, I’ll redirect the epilogue to /dev/null.

How not to fix a build warning

Saturday, December 22nd, 2007

The Hollywood writers strike continues, and the desperation grows for alternative sources of entertainment. Fortunately, we programmers can find entertainment in our own sources. I’ve got some reality programming for you! The following snippet of code is taken from an actual CVS commit. (Yes, CVS. Don’t laugh. Do cry for me, Argentina.) This build warning ‘fix’ was made by some contractor for some project that I worked on at some point in time for some company. To protect the innocent and/or guilty, I won’t say who, what, when, or where. As for why, I wish I knew. Or maybe not.


NSEnumerator* fileEnum = [fileArray objectEnumerator];
NSDictionary* aDict = nil;
//Changed to Remove the Build Warnings
//while(aDict = [fileEnum nextObject])
while(aDict == [fileEnum nextObject])

Let this example serve as a lesson. Not for programmers — the one who wrote it is probably hopeless — but rather for managers. Please do not just hire the lowest bidder!

BOOLing for Dollars

Sunday, September 30th, 2007

While we’re all aiting-way or-fay eopard-Lay, I’d like to share a pointer that I picked up while mugging a C library. (I have no idea what that means. It seemed witty when I wrote it.) As you know, I’m always ahead of the curve, setting the trends, framing the public discourse. Thus, I should add my 1.2 cents — the dollar is weak, and I’m a little short this month — on a hot topic discussed on the Cocoa-dev mailing list recently (in geological time, anyway): the use of the Objective-C BOOL type.

If you look in the header file /usr/include/objc/objc.h, you can see how BOOL is defined:


	typedef signed char		BOOL;
	// BOOL is explicitly signed so @encode(BOOL) == "c" rather than "C"
	// even if -funsigned-char is used.

	#define YES             (BOOL)1
	#define NO              (BOOL)0

A char type — e.g., char, signed char, unsigned char — is always one byte, i.e., sizeof(signed char) == 1, whereas in most implementations an int type is more than one byte. A byte standardly consists of 8 bits, or 12 nibbles. What happens to the extra bits if you convert an int into a BOOL? According to the wacky rules of C type conversion, the result is implementation-dependent. Many implementations simply throw away the highest bits. (Other implementations recycle them into information superhighway speed bumps.) As a consequence, it’s possible that myIntVar != 0 && (BOOL)myIntVar == NO.

Usually we don’t have to worry about this, because ‘boolean’ operators in C, such as == and !, always return 1 or 0. When we use bitwise operators, on the other hand, the problem does come into play. Suppose, for example, that we’re testing whether the option key is down. The method -[NSEvent modifierFlags] returns a bit field indicating the modifier keys that are pressed, and bit masks can be used to test for specific keys. Consider the following code; there are situations where doSomethingAfterEvent: does something, yet doSomethingElseAfterEvent: does nothing.


	-(void) doSomethingAfterEvent:(NSEvent *)anEvent
	{
		if (anEvent)
		{
			if ([anEvent modifierFlags] & NSAlternateKeyMask)
			{
				[self doSomething];
			}
		}
	}

	-(void) doSomethingElseAfterEvent:(NSEvent *)anEvent
	{
		if (anEvent)
		{
			BOOL shouldDoSomethingElse = [anEvent modifierFlags] & NSAlternateKeyMask;
			if (shouldDoSomethingElse)
			{
				[self doSomethingElse];
			}
		}
	}

It has been suggested on the mailing list that the type conversion could be handled by


	BOOL shouldDoSomethingElse = !!([anEvent modifierFlags] & NSAlternateKeyMask);

or


	BOOL shouldDoSomethingElse = ([anEvent modifierFlags] & NSAlternateKeyMask) != 0;

However, these approaches would only work for single-bit masks. What if we wanted to test both the option key and the shift key?

The point I wish to make here actually has little to do with the BOOL type. (Say what?!?) Bitwise operators are not boolean operators. A boolean operator only returns 1 or 0. A bitwise operator, in contrast, can return any bit field. The proper way to handle a bitmask is to test whether the resulting bit field has the desired value:


	unsigned int myMask = NSAlternateKeyMask | NSShiftKeyMask;
	BOOL isMyKeyComboPressed = ([anEvent modifierFlags] & myMask) == myMask;

Yes, I know Robot Chicken already covered this subject a ha-while ago. I didn’t really care for it.

Invoking errors

Sunday, April 1st, 2007

If I had to name my specialty as a developer, it would be error code. I’ve reached a point in my career where I can invoke errors almost instinctively. This post is about NSErrors, however, which differ from regular errors in one crucial respect: the more NSErrors in your app, the better. Yes, even when Apple makes errors, they’re appealing. This post is also about NSInvocations, which sound arcane, and they are. I use them, but I’m afraid of them … kind of like microwave ovens. I have no idea what’s going on inside, but they make good Cocoa. (Cat’s breath, charm of sleep and bath, thy omen of scratching.)

There will come a time, maybe not today, maybe not tomorrow, maybe not soon, maybe not for the rest of your life, but someday anyway, when you’ll want to use an NSError as an argument to an NSInvocation. Or maybe never, but you’ll certainly want to use a pointer to a pointer to an NSError. That’s an error’s cousin, once removed; they can marry in selected states. The standard pattern for handling errors with Cocoa methods is to directly return a BOOL indicating success or failure and to return ‘by reference’ (look that up in your Funk & Wagnalls) an NSError that provides additional information about the failure, e.g., You’re a loser, and who does your hair, Floyd the barber? If you call the method by name — heyJoe — you can just pass the address (&) of your NSError object as an argument to the method. What if you want the method to vary at runtime, though?

Suppose that your app needs to perform a series of disparate tasks, and each task requires error checking afterward to determine whether the series should continue. Using conventional methods, your code could become a winding mess. All roads lead to Rome, just not in a straight line. The more pleasant alternative, in my opinion, is to wrap up everything into NSInvocations at the beginning to store in an NSArray for iteration. And rather than having a monstrous error checking method with endless if else clauses, you could have an aptly named error checker corresponding to an individual task, just as in key-value coding you have an aptly named setter corresponding to the getter. For example, we have a sample error checking method:


-(BOOL) didSucceedAtJoe:(id)result error:(NSError **)error {
	BOOL didSucceed = result != nil;
	if (!didSucceed && error) {
		*error = [NSError errorWithDomain:@"MyErrorDomain" code:69 userInfo:nil];
	}
	return didSucceed;
}
	

Such a method would be invoked by a generic error checker:


-(BOOL) didSucceedAtTask:(NSDictionary *)taskInfo {
	BOOL didSucceed = NO;
	NSError * error = nil;
	NSInvocation * invocation = [self invocationForDidSucceedAtTaskWithName:[taskInfo objectForKey:@"name"]];
	if (invocation) {
		id result = [taskInfo objectForKey:@"result"];
		[invocation setArgument:&result atIndex:2];
		NSError ** errorPointer = &error;
		[invocation setArgument:&errorPointer atIndex:3];
		[invocation invokeWithTarget:self];
		[invocation getReturnValue:&didSucceed];
	}
	NSLog(@"error:%@", error);
	return didSucceed;
}
	

It’s crucial to the invocation that you declare, and thus allocate memory for, both NSError * error and NSError ** errorPointer. It’s not sufficient simply to declare the latter, and you definitely can’t use a construction such as &(&error), because &error basically just gives a number, which has no address itself. Finally, we have the method for creating the invocation:


-(NSInvocation *) invocationForDidSucceedAtTaskWithName:(NSString *)name {
	NSInvocation * invocation = nil;
	if (name) {
		SEL selector = NSSelectorFromString([NSString stringWithFormat:@"didSucceedAt%@:error:", name]);
		if (selector) {
			if (ivarInvocation) {
				invocation = [[ivarInvocation retain] autorelease];
				[invocation setSelector:selector];
			} else {
				NSMethodSignature * signature = [self methodSignatureForSelector:selector];
				if (signature) {
					invocation = [NSInvocation invocationWithMethodSignature:signature];
					if (invocation) {
						ivarInvocation = [invocation retain];
						[invocation setSelector:selector];
					}
				}
			}
		}
	}
	return invocation;
}
	

The invocation is stored as an instance variable because the method signature depends only on the argument and return types, so it will remain the same for all of our selectors. Now our apps can invoke errors at a rate matched only by analysts, economists, and pundits.

On a personal note, I’d like to invite my faithful readers (all three, including myself) to my wedding today. Although the location is unusual — the Virgin Megastore in Orange County — it will nonetheless be a traditional ceremony. Pamela Anderson is signing.

WordPress Bug Fix!

Saturday, January 20th, 2007

This is the first post in what I hope is a series, which I’ll call “WordPress Bug Fix Near Saturday”. And while I’m naming things, I’ll call this first post “Episode IV: A New Hope” (to be followed, no doubt, by “Episode V: All Hope Dashed”). I’m very happy to report that the ETag parsing bug has been fixed in WordPress 2.0.7. Thus, if you’re running WordPress 2.0.7 on your site, you no longer have to comment out the following line in the file wp-includes/classes.php:

@header("ETag: $wp_etag");

My logs confirm that WordPress is now correctly parsing its own ETags and sending out HTTP 200 and 304 responses as appropriate. The Penultimate Warrior is victorious and undefeated! Note, however, that none of the bugs that I reported before the Warrior started Running Wild® (cp. Going Wild®) have been fixed yet.

Speaking of my web site logs, they contain a list of phrases that were used to find my site from internet search engines. I’d like to share a few of them that have caught my eye:

  1. instructions+for+using+the+thighmaster

    Squeeze, release, repeat. You’re welcome.

  2. how+do+u+declare+a+pointer+to+an+array+of+pointers+to+int%3f%3f+in+c+language

    int * (* ptr)[];

  3. talk+to+cat+software

    That doesn’t exist, Dr. Doolittle. Try meowing.

  4. cat+lederhosen

    I’ve given your IP to the SPCA.

  5. betamax+the+sausage+and+the+mouse

    These aren’t the droids we’re looking for.

  6. what+does+ns+stand+for+i+cocoa

    NeXTSTEP. Next!

  7. circle+k+employee+uniforms

    I’m so very sorry, dude.

  8. if+jeff+s+usual+is+a+hint+for+a+password+what+is+the+password

    Stop trying to hack into my account, you scoundrel!

Bring back STX and ETX

Saturday, December 30th, 2006

Some people think that XML is the greatest invention since sliced bread. (I won’t name names, to protect the non-existent.) In contrast, I think that it’s just a symptom of a disease, a terminal illness infecting the entire computing world. Programmers are supposed to be smart, but we’re actually the ones who are responsible for the spread of the disease. We started and promoted the practice of using printable text to delimit printable text. In my haughty opinion, this was one of the worst ideas in the history of computers (surpassed only by SMTP and MacAppADay). It was doomed to fail from the beginning, kind of like the paradox of the liar. The use of printable text to delimit printable text has been the cause of countless bugs — let’s say 500 billion — and indeed, countless security vulnerabilities. It continues to plague us today.

Having worked on a feed reader, I do know a thing or two about this issue. I can tell you that it’s a major pain to parse XML feeds. Parsing HTML is even worse, but thankfully we can leave the majority of that to WebKit. It’s hard enough when everything is perfect, but we inevitably run into issues where the text is improperly escaped or not properly escaped. This is no fun for anyone.

Since the beginning, ASCII contained a number of non-printing control characters, but for some reason they have fallen out of favor. Among the control characters are STX (0x2) and ETX (0x3). Their position in the list of character codes indicates their importance: they were used to delimit text. With character codes such as these, parsing data into strings becomes trivial:

  1. Start parsing a string when you see a STX code.
  2. Continue until you see a ETX code, you see a non-character code, you reach a preset maximum length, or you run out of data.
  3. If the last code was ETX, you’ve got a good string. Otherwise, you’ve encountered an error, and you can do whatever error handling you like.
  4. There are no more steps. The characters in the string are all literal, no unescaping necessary.

Unicode has added some similar codes such as SOS and ST. I’d like to see even more control codes, to allow for fine-grained specification of the structure of the text. For example, we could have control codes to delimit words, sentences, paragraphs, etc. This would be similar to tags in HTML but without the use of printable characters to represent the tags.

Why don’t we do this now? One objection is that files containing control characters are not human readable. I think that this is a lame excuse, because no computer file is human readable. Although my hard drive is enclosed, preventing me from examining the files on there, I have burned text files to DVD, and no matter how long I stare and squint at the shiny bottom, all I can see is my own reflection. Anyway, a lot of markup is human readable only in the sense that Derrida is human readable: there is a series of legible text characters, but do you really want to wade through all the crap to make sense of it?

Perhaps the real point underlying this objection is that control-character delimited text would not be readable by simple (i.e., dumb) text editors. This is true, but why should we be ruled by the lowest common denominator? Many modern text editors are quite intelligent and could handle the new format easily. They can already parse various forms of syntax and highlight them for the user. Let’s not let backward compatibility hold us back. That’s certainly not the Apple Way. It’s not entirely the Microsoft Way either; after all, the Word file format makes no concession to simple text editors. Neither does the cross-platform Adobe PDF.

The most powerful objection to using control characters as text delimiters is that we shouldn’t force users to learn how to input control characters along with text. I agree, which is why I think the burden should be placed on computer programs — the text editors and command line interpreters — rather than on users. When taking text input from users, an app should do the following:

  1. Use the context to guess the user’s intention.
  2. Give a visual indication of the guess to the user. Syntax coloring is one example, but the possibilities are endless. Be creative.
  3. Make it easy for the user to correct bad guesses.

In command line interpreters, by the way, there’s no good reason why the space key needs to separate arguments, as opposed to a key for a non-printable character such as escape. It’s the 21st century, by Jove, and we should be finally be able to use any printable character in a file name, including colons, quotes, slashes, and spaces, without having to do voodoo on the command line just to refer to it! (I won’t even mention hierarchical file systems, which are themselves a bad idea. Oops, I just did. Since I mentioned it, the ideal behavior when a user enters a file name is to quickly find the named file or files, which any decent file system should be able to do, and show a visual preview so that the user can verify or choose the correct file, if necessary.)

This rant has been brought to you by BBEdit. The makers of BBEdit, I assume, take no responsibility or credit for the content here, nor do they endorse the opinions I’ve expressed. (Or do they?)

(No, not as far as I know, which is nothing. In any case, I do endorse BBEdit.)