Archive for the ‘Unix’ Category

Working with GitHub

Sunday, August 22nd, 2010

This blog post is intended to be a reference not only for you but for me, because I always forget how it works. “It”, in this case, is Git, or more specifically, GitHub. GitHub is a web site that hosts public Git repositories. It allows anyone in the world to collaborate on any GitHub project without having to worry about repository commit privileges or patch files. If you want to work on a GitHub project, you simply fork the project’s GitHub repository, creating a new GitHub repository in your name. After you make changes to your GitHub repository, you can send pull requests to other GitHub users to notify them of the changes, and they can pull your changes into their own repositories.

While GitHub does make social coding convenient, there is still a significant learning curve. I won’t discuss the process of installing Git or signing up for GitHub account; you can find documentation for those elsewhere. The most confusing aspect of working with GitHub, in my opinion, is repository management, and that’s what I’ll explain here. My explanation will give you the steps of the GitHub workflow using an example project. I’ve chosen ClickToFlash as my example, because people are familiar with it, and I’ve contributed code to the project.

Mystery surrounds the origin of ClickToFlash. The project first appeared on Google Code, posted by an anonymous donor. Not long thereafter, the project disappeared without a trace. We still don’t know the identity of ClickToFlash’s author. (I suspect Holtzman, Holtzmann, or Holzmann.) Fortunately, several developers including ‘Wolf’ Rentzsch preserved the source code, and Rentzsch’s GitHub repository has become ‘official’. Other GitHub repositories such as my own and Simone Manganelli’s are forked from Rentzsch’s. That’s the place to start.

If you haven’t already forked ClickToFlash, you’ll see a “Fork” button on http://github.com/rentzsch/clicktoflash. When I clicked that button, it created a fork of the project at http://github.com/lapcat/clicktoflash. The fork is my public repository, which GitHub users can pull from. The catch is that I can’t work directly on my public repository, because I don’t have shell access to GitHub’s servers where the repository resides. Besides, I wouldn’t be able to run Xcode anyway. Thus, I have to clone the repository on my own local Mac. The URL of my public repository is listed on http://github.com/lapcat/clicktoflash. Actually, there are multiple URLs, but you’ll want to make sure the clone the SSH version, which is read-write. If you clone the read-only version, then you won’t be able to push changes back to the public repository.

$ git clone git@github.com:lapcat/clicktoflash.git

Your private, local clone automatically has a master branch that matches the master branch of your public, remote repository.

$ cd clicktoflash
$ git branch
* master
$ git status
# On branch master
nothing to commit (working directory clean)

Your remote cloned repository is given the special name origin by your local clone repository.

$ git remote
origin

The local master branch also tracks the remote repository, so that git fetch, git pull, and git push automatically apply to origin when run with master checked out.

$ git remote show origin
* remote origin
  Fetch URL: git@github.com:lapcat/clicktoflash.git
  Push  URL: git@github.com:lapcat/clicktoflash.git
  HEAD branch: master
  Remote branches:
    cutting-edge tracked
    master       tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

Despite the fact that the local repository is a clone of origin, and origin is a fork of rentzsch, the local repository knows nothing of rentzsch. [Expletives censored.] If you want to pull changes from rentzsch, you need to add it as a remote repository. In this case, you can use the read-only URL, because you can’t push changes to his repository.

$ git remote add rentzsch git://github.com/rentzsch/clicktoflash.git
$ git remote
origin
rentzsch

Some people suggest upstream for the name of the remote repository, but I find this needlessly confusing. The name rentzsch tells me exactly where the changes are coming from. Unlike upstream, it’s not abstract or subject to misinterpretation with origin.

Note that unless you use the -f option, the rentzsch remote is not automatically fetched, so you’ll need to fetch it manually. I also find this needlessly confusing and wish the default behavior were to fetch rather than not fetch. You might find yourself perplexed, for example, if you try to create a new branch from the remote.

$ git branch rentzsch-master rentzsch/master
fatal: Not a valid object name: 'rentzsch/master'.
$ git branch -r
  origin/HEAD -> origin/master
  origin/cutting-edge
  origin/master
$ git fetch rentzsch
$ git branch -r
  origin/HEAD -> origin/master
  origin/cutting-edge
  origin/master
  rentzsch/1.4.2-64bit
  rentzsch/cutting-edge
  rentzsch/gh-pages
  rentzsch/master
$ git branch rentzsch-master rentzsch/master
Branch rentzsch-master set up to track remote branch master from rentzsch.

I recommend that you create a branch specifically to track the forked repository, as I do in the last instruction above. Then no matter what changes you make, you can still look at the ‘official’ version of the project by checking out the rentzsch-master branch. If you use the remote branch rentzsch/master as the starting point for the local branch rentzsch-master, the local branch automatically tracks the remote repository rentzsch, just as the local master automatically tracks the remote origin.

$ git remote show rentzsch
* remote rentzsch
  Fetch URL: git://github.com/rentzsch/clicktoflash.git
  Push  URL: git://github.com/rentzsch/clicktoflash.git
  HEAD branch: master
  Remote branches:
    1.4.2-64bit  tracked
    cutting-edge tracked
    gh-pages     tracked
    master       tracked
  Local branch configured for 'git pull':
    rentzsch-master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (fast-forwardable)

When changes occur in the master branch of the remote rentzsch repository, here is the procedure for merging them:

$ git checkout rentzsch-master
$ git fetch
$ git merge rentzsch/master
$ git checkout master
$ git merge rentzsch-master
$ git push

You could use the one step git pull instead of the two steps git fetch and git merge rentzsch/master. However, I’ve heard it suggested that git pull sometimes causes problems, though that issue is beyond the scope of this blog post. Anyway, what you’re doing with these steps is first merging the remote rentzsch repository changes into the local rentzsch-master branch, then merging the local rentzsch-master branch into the local master branch, and finally pushes the local changes to the remote origin repository. The somewhat convoluted procedure is necessary because you cannot directly pull the remote rentzsch changes into origin, they have to go through the local repository.

The key to successful repository management, I believe, is to never write code on the local master branch. I’ve learned this important lesson by trial and error. In particular, if you try to merge changes from a remote repository into master while you have local changes on master that haven’t yet been pushed to origin, everything can blow up. It’s best to keep master as pure as possible. In fact, it’s best to keep all your tracking branches as pure as possible. With Git, branches are cheap. When you want to make local changes, always create and check out a new branch, and then merge the changes back into the tracking branch when you want to push.

As far as I can tell, origin by default will contain the same branches that existed in the rentzsch repository at the time you forked it. Consequently, http://github.com/lapcat/clicktoflash only has 2 branches, whereas Rentzsch’s GitHub repository has 4. In any case, the local clone only has master by default. New branches created in the local repository with local starting points are not automatically pushed to origin. This means you can safely hack on local code changes in a branch without exposing your mess to the public. If you want to work on another public ClickToFlash branch, such as rentzsch/cutting-edge instead of rentzsch/master, you’ll need to create a new local branch.

$ git branch rentzsch-cutting-edge rentzsch/cutting-edge
Branch rentzsch-cutting-edge set up to track remote branch cutting-edge from rentzsch.
$ git branch cutting-edge rentzsch-cutting-edge
$ git checkout cutting-edge
Switched to branch 'cutting-edge'

Again, we have both a branch rentzsch-cutting-edge that is a duplicate of rentzsch/cutting-edge and a branch cutting-edge that includes your changes. This mirrors the arrangement of the branches rentzsch-master and master.

If origin already contains a cutting-edge branch, then git push should be sufficient to push your local changes. (Beware: in another maddening default behavior, git push will push all branches that exist on origin, i.e., master and cutting-edge, not just the currently checked out branch.) On the other hand, if origin does not yet contain a cutting-edge branch, you’ll need to use git push origin cutting-edge to create the branch on origin.

I hope this mini tutorial helps you to work with GitHub more efficiently and with fewer headaches (from banging your head against the wall). If you have further questions, feel free to ask … someone else, because I don’t know the answer.

Local variables are free

Saturday, December 19th, 2009

This is part II of my irregularly scheduled series on compiler optimization. In part I, I explained how the compiler can optimize away return statements, resulting in missed breakpoints. My given workaround to that problem, though effective, was very ugly and architecture-dependent, much like Cowboys Stadium.

(gdb) break *0x00001fc5 if $eax != 0

Although there’s not much we can do to prevent the compiler optimization, we can greatly simplify our conditional breakpoint. I had suggested rewriting the source code, which was awe-inspiringly prescient, because that’s what I’m going to do now. Here’s the original code:

8	if (ShouldReturn())
9		return;

And here’s the revised code:

8	int localVar = ShouldReturn();
9	if (localVar)
10		return;

The return at line 10 will still be optimized away. However, the revised code allows us to set a simple breakpoint at line 9 that will stop when we want:

(gdb) break 9 if localVar != 0

No knowledge of the architecture, machine registers, or assembly language is required.

From the beginning of time (January 1970, of course), programmers have struggled over coding style. Objective-C programmers, for example, expend undue effort arranging their brackets. (I have [NSMutableArray array] going to the Final Four.) For some, bracket-making becomes a kind of game or contest.

[[[[[[[[[[[[[See how] many] method] calls] we] can] fit] on] one] line] of] source] code];

I’ve changed my coding style over the years, but I’ve settled on one fundamental principle: write your code so that it’s easy to debug. All your fancy margin-aligning isn’t going to help when you need to figure out why your app keeps exploding. If you have nested method calls on one line of code, you can’t easily set a breakpoint in the middle. That’s why I prefer as much as possible to have only one method call per line of code, and create a local variable to store the return value.

There is a misconception that local variables are expensive, in terms of either computation or memory. The truth is that local variables are very cheap, the value meals of the computing world. (Would you like trans fat with your saturated fat?) It only takes one machine instruction to store a pointer address to a local variable. One machine instruction is really quite fast, about as fast as you can get — at least with restrictor plates. With regard to memory, local variables only take up stack space. To create a local variable, you simply move the stack a little. When the method or function returns, the stack is moved back, and thereby the space reserved for local variables is automatically recovered. Of course, you don’t want to create large C arrays on the stack, but a pointer to an Objective-C object only takes 4 bytes on the stack for 32-bit, 8 bytes for 64-bit. The default 32-bit stack size is 8MB, so you’re not going to run out of space unless you have deeply recursive calls.

Even these small costs are only relevant in the context of your app’s unoptimized, debug configuration. For your customers, on the other hand, local variables are free. As in Mumia, or Bird. When you compile your app using the release configuration, the local variables disappear, the compiler optimizes them away. (By the way, this is one of the reasons that debugging the release build of your app can be a frustrating and/or wacky experience.) To see the optimization in action, let’s consider some sample code:

1  #import <Foundation/Foundation.h>
2
3  @interface MyObject : NSObject {}
4  @end
5
6  @implementation MyObject
7
8  -(NSString *)myDirectProcessName {
9  	return [[[NSProcessInfo processInfo] processName] lowercaseString];
10 }
11
12 -(NSString *)myRoundaboutProcessName {
13 	NSString *myRoundaboutProcessName = nil;
14 	NSProcessInfo *processInfo = [NSProcessInfo processInfo];
15 	NSString *processName = [processInfo processName];
16 	NSString *lowercaseString = [processName lowercaseString];
17 	myRoundaboutProcessName = lowercaseString;
18 	return myRoundaboutProcessName;
19 }
20
21 @end
22
23 int main(int argc, const char *argv[]) {
24 	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
25 	MyObject *myObject = [[[MyObject alloc] init] autorelease];
26 	NSLog(@"My direct process name: %@", [myObject myDirectProcessName]);
27 	NSLog(@"My roundabout process name: %@", [myObject myRoundaboutProcessName]);
28 	[pool release];
29 	return 0;
30 }

The above code is obviously contrived and useless. It only has value for explanatory purposes, and perhaps in the app store for $0.99. The methods -myRoundaboutProcessName and -myDirectProcessName do the same thing, the former with and the latter without local variables. Here’s the i386 disassembly for the methods when compiled using the debug configuration:

-[MyObject myDirectProcessName]:
00001d2a	nop
00001d2b	nop
00001d2c	nop
00001d2d	nop
00001d2e	nop
00001d2f	nop
00001d30	pushl	%ebp
00001d31	movl	%esp,%ebp
00001d33	pushl	%ebx
00001d34	subl	$0x14,%esp
00001d37	calll	0x00001d3c
00001d3c	popl	%ebx
00001d3d	leal	0x000012e8(%ebx),%eax
00001d43	movl	(%eax),%eax
00001d45	movl	%eax,%edx
00001d47	leal	0x000012e4(%ebx),%eax
00001d4d	movl	(%eax),%eax
00001d4f	movl	%eax,0x04(%esp)
00001d53	movl	%edx,(%esp)
00001d56	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d5b	movl	%eax,%edx
00001d5d	leal	0x000012e0(%ebx),%eax
00001d63	movl	(%eax),%eax
00001d65	movl	%eax,0x04(%esp)
00001d69	movl	%edx,(%esp)
00001d6c	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d71	movl	%eax,%edx
00001d73	leal	0x000012dc(%ebx),%eax
00001d79	movl	(%eax),%eax
00001d7b	movl	%eax,0x04(%esp)
00001d7f	movl	%edx,(%esp)
00001d82	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001d87	addl	$0x14,%esp
00001d8a	popl	%ebx
00001d8b	leave
00001d8c	ret
-[MyObject myRoundaboutProcessName]:
00001d8d	nop
00001d8e	nop
00001d8f	nop
00001d90	nop
00001d91	nop
00001d92	nop
00001d93	pushl	%ebp
00001d94	movl	%esp,%ebp
00001d96	pushl	%ebx
00001d97	subl	$0x24,%esp
00001d9a	calll	0x00001d9f
00001d9f	popl	%ebx
00001da0	movl	$0x00000000,0xe8(%ebp)
00001da7	leal	0x00001285(%ebx),%eax
00001dad	movl	(%eax),%eax
00001daf	movl	%eax,%edx
00001db1	leal	0x00001281(%ebx),%eax
00001db7	movl	(%eax),%eax
00001db9	movl	%eax,0x04(%esp)
00001dbd	movl	%edx,(%esp)
00001dc0	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dc5	movl	%eax,0xec(%ebp)
00001dc8	movl	0xec(%ebp),%edx
00001dcb	leal	0x0000127d(%ebx),%eax
00001dd1	movl	(%eax),%eax
00001dd3	movl	%eax,0x04(%esp)
00001dd7	movl	%edx,(%esp)
00001dda	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001ddf	movl	%eax,0xf0(%ebp)
00001de2	movl	0xf0(%ebp),%edx
00001de5	leal	0x00001279(%ebx),%eax
00001deb	movl	(%eax),%eax
00001ded	movl	%eax,0x04(%esp)
00001df1	movl	%edx,(%esp)
00001df4	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001df9	movl	%eax,0xf4(%ebp)
00001dfc	movl	0xf4(%ebp),%eax
00001dff	movl	%eax,0xe8(%ebp)
00001e02	movl	0xe8(%ebp),%eax
00001e05	addl	$0x24,%esp
00001e08	popl	%ebx
00001e09	leave
00001e0a	ret

As expected, -myRoundaboutProcessName makes more room on the stack than -myDirectProcessName:

00001d34	subl	$0x14,%esp
00001d97	subl	$0x24,%esp

At 00001da0, -myRoundaboutProcessName sets the value of the local variable to nil, as in line 13 of the source code. The interesting differences, though, are immediately after the calls to objc_msgSend(). By the standard ABI, the register eax contains the return value of objc_msgSend(). In -myDirectProcessName, the value in eax is simply moved to the register edx:

00001d5b	movl	%eax,%edx

In contrast, -myRoundaboutProcessName first stores the value on the stack before moving it to edx. The address on the stack is the space reserved for the local variable:

00001dc5	movl	%eax,0xec(%ebp)
00001dc8	movl	0xec(%ebp),%edx

After the final objc_msgSend() call, -myDirectProcessName doesn’t bother to do much, because the return value in eax will become the return value of the whole method. In -myRoundaboutProcessName, it needs to store values in local variables as in lines 16 and 17 of the source code:

00001df9	movl	%eax,0xf4(%ebp)
00001dfc	movl	0xf4(%ebp),%eax
00001dff	movl	%eax,0xe8(%ebp)
00001e02	movl	0xe8(%ebp),%eax

So that’s how the methods differ in the unoptimized build. Now let’s see what happens when we use the release configuration. Here’s the optimized disassembly for -myDirectProcessName:

-[MyObject myDirectProcessName]:
00001dce	pushl	%ebp
00001dcf	movl	%esp,%ebp
00001dd1	subl	$0x18,%esp
00001dd4	movl	0x00003000,%eax
00001dd9	movl	%eax,0x04(%esp)
00001ddd	movl	0x0000302c,%eax
00001de2	movl	%eax,(%esp)
00001de5	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dea	movl	0x00003004,%edx
00001df0	movl	%edx,0x04(%esp)
00001df4	movl	%eax,(%esp)
00001df7	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001dfc	movl	0x00003008,%edx
00001e02	movl	%edx,0x0c(%ebp)
00001e05	movl	%eax,0x08(%ebp)
00001e08	leave
00001e09	jmpl	0x0000400a	; symbol stub for: _objc_msgSend

The optimized method is significantly shorter, as expected from the compiler option -Os. First, you’ll notice that all those pesky nop instructions have been deleted. Stallman put them in unoptimized builds just to annoy us. (Or they may have been for Fix and Continue, but I always assume the worst.) There are additional optimizations as well that I won’t belabor here, because I’m eager to get to the climax. (Sorry, dear.) For your enlightenment and enjoyment, here’s the optimized disassembly for -myRoundaboutProcessName:

-[MyObject myRoundaboutProcessName]:
00001e0e	pushl	%ebp
00001e0f	movl	%esp,%ebp
00001e11	subl	$0x18,%esp
00001e14	movl	0x00003000,%eax
00001e19	movl	%eax,0x04(%esp)
00001e1d	movl	0x0000302c,%eax
00001e22	movl	%eax,(%esp)
00001e25	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001e2a	movl	0x00003004,%edx
00001e30	movl	%edx,0x04(%esp)
00001e34	movl	%eax,(%esp)
00001e37	calll	0x0000400a	; symbol stub for: _objc_msgSend
00001e3c	movl	0x00003008,%edx
00001e42	movl	%edx,0x0c(%ebp)
00001e45	movl	%eax,0x08(%ebp)
00001e48	leave
00001e49	jmpl	0x0000400a	; symbol stub for: _objc_msgSend

Identical! Ah, that’s nice. Smoke ‘em if you got ‘em.

In conclusion, feel free to sprinkle, pepper, dash, or even drown your code with local variables. And with the engineering hours of debugging time you save, get me a nice (not free) present. I’m partial to flavored coffee and unflavored MacBooks.

Why did my breakpoint not get hit?

Monday, November 16th, 2009

This is part I of a II+ (take that, trademark trolls) part series on compiler optimization. For the gcc compiler, you can specify the level of optimization with various -O options. The default for compiling is -O0, which means do not optimize. As we shall see, however, the compiler always optimizes to an extent. That is to say, gcc -O0, you lie!

The primary reason for using the -O0 option (besides to avoid compiler optimization bugs) is to facilitate debugging of your code. With higher levels of optimization, the compiler is given more freedom to ‘ignore’ your source code in writing machine instructions, as long as the results are the same. Although it is possible to debug optimized binaries, the experience is often confusing and unhelpful for the programmer (much like reading cocoa-dev). Turning off optimization gives the closest correlation between source code and machines instructions. Yet even with no optimization, the correlation is not perfect, and this can lead to debugging problems.

Let’s consider a simple example:

$ cat > returnbreak.c
#include <stdio.h>

int ShouldReturn(void) {
	return 1;
}

void HelloWorld(void) {
	if (ShouldReturn())
		return;

	printf("Hello, World!\n");
}

int main(int argc, const char *argv[]) {
	HelloWorld();
	return 0;
}
$ gcc -g -O0 -o returnbreak returnbreak.c
$ gdb returnbreak
GNU gdb 6.3.50-20050815 (Apple version gdb-966) (Tue Mar 10 02:43:13 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done

(gdb) list HelloWorld
2
3	int ShouldReturn(void) {
4		return 1;
5	}
6
7	void HelloWorld(void) {
8		if (ShouldReturn())
9			return;
10
11		printf("Hello, World!\n");
(gdb) break 9
Breakpoint 1 at 0x1fc9: file returnbreak.c, line 9.
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak
Reading symbols for shared libraries ++. done

Program exited normally.

WTF?!? Why did my breakpoint not get hit?

(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc9 in HelloWorld at returnbreak.c:9

Hmm, that seems ok. Let’s try something else.

(gdb) break HelloWorld
Breakpoint 2 at 0x1fc0: file returnbreak.c, line 8.
(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc9 in HelloWorld at returnbreak.c:9
2   breakpoint     keep y   0x00001fc0 in HelloWorld at returnbreak.c:8
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak 

Breakpoint 2, HelloWorld () at returnbreak.c:8
8		if (ShouldReturn())
(gdb) c
Continuing.

Program exited normally.

Odd, it hits the breakpoint at line 8 but not at line 9. The breakpoint on line 9 is at address 0x00001fc9, so let’s look at the (i386) disassembly for that:

(gdb) disassemble 0x00001fc9
Dump of assembler code for function HelloWorld:
0x00001fb3 <HelloWorld+0>:	push   %ebp
0x00001fb4 <HelloWorld+1>:	mov    %esp,%ebp
0x00001fb6 <HelloWorld+3>:	push   %ebx
0x00001fb7 <HelloWorld+4>:	sub    $0x14,%esp
0x00001fba <HelloWorld+7>:	call   0x1fbf <HelloWorld+12>
0x00001fbf <HelloWorld+12>:	pop    %ebx
0x00001fc0 <HelloWorld+13>:	call   0x1fa6 <ShouldReturn>
0x00001fc5 <HelloWorld+18>:	test   %eax,%eax
0x00001fc7 <HelloWorld+20>:	jne    0x1fd7 <HelloWorld+36>
0x00001fc9 <HelloWorld+22>:	lea    0x30(%ebx),%eax
0x00001fcf <HelloWorld+28>:	mov    %eax,(%esp)
0x00001fd2 <HelloWorld+31>:	call   0x3005 <dyld_stub_puts>
0x00001fd7 <HelloWorld+36>:	add    $0x14,%esp
0x00001fda <HelloWorld+39>:	pop    %ebx
0x00001fdb <HelloWorld+40>:	leave
0x00001fdc <HelloWorld+41>:	ret
End of assembler dump.

When ShouldReturn() returns, the return value is in the register eax. The test instruction at 0x00001fc5 performs a bitwise AND of the two operands — which in this case are the same. If the result is non-zero — and in this case the result is 1 — the Zero Flag in the EFLAGS register is set to 0. This instruction corresponds to evaluating the conditional on line 8 of our source code. Then the jne instruction at 0x00001fc7 jumps to a certain address if the Zero Flag is 0. In our source code, the flow of control should move to the return statement on line 9 when the conditional evaluates to non-zero. According to the machine instructions, on the other hand, it jumps to 0x1fd7 when the conditional evaluates to non-zero. This address is the beginning of the standard function epilog, which restores the stack and registers to their previous state before returning.

The problem here is that while the function HelloWorld() has two exit points in our source code, it only has one exit point in the machine instructions. In essence, the compiler has optimized for size, despite our use of the -O0 option. Given the generated machine instructions, there is nowhere to put a breakpoint that will only be hit when the conditional at line 8 evaluates to non-zero. A breakpoint at 0x00001fc5 or 0x00001fc7 would be hit whenever the conditional is evaluated, which is always. A breakpoint at 0x00001fd7 would be hit whenever the function returns, which is always as well. Unfortunately, gdb places the breakpoint at 0x00001fc9, which is actually the opposite of what we intended, because it only gets hit when the conditional evaluates to zero. This is why the program exits normally without ever hitting the breakpoint. I consider this to be a bug in gdb; it would be better, I think, if it would just fail and give an error when we try to set the breakpoint. Of course, it may be a bug in gcc that it optimizes away our multiple exit points with optimization off. But hey, what do you expect from free software?

There are several workarounds for this problem. One would be to re-write your source code. (No, that’s not a joke. See Part II of this series.) Another workaround, if you only want to break on the result of a conditional, is to use a conditional breakpoint:

(gdb) delete break
Delete all breakpoints? (y or n) y
(gdb) break *0x00001fc5 if $eax != 0
Breakpoint 1 at 0x1fc5: file returnbreak.c, line 8.
(gdb) info break
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x00001fc5 in HelloWorld at returnbreak.c:8
	stop only if $eax != 0
(gdb) run
Starting program: /Users/jeff/Desktop/returnbreak 

Breakpoint 1, 0x00001fc5 in HelloWorld () at returnbreak.c:8
8		if (ShouldReturn())
(gdb) c
Continuing.

Program exited normally.

To summarize, if you find that your breakpoints are not getting hit, you now know who to blame. Namely, yourself. It’s almost certain that your Xcode project settings are wrong.

Apple hot-swapped Mac OS X 10.5.8

Tuesday, September 1st, 2009

There has been some confusion in the net-o-sphere over the existence of two Mac OS X 10.5.8 builds: 9L30 and 9L31a. I think it’s time to clear up that confusion, now that Max OS X 10.6.0 has been released and nobody cares anymore about 10.5.x.

The Max OS X build version is stored in the following file on your hard drive:

/System/Library/CoreServices/SystemVersion.plist

On my machine, the build is 9L30. I installed 10.5.8, as usual, via the combo updater (because I’m a paranoid superfreak who also repairs permissions and offers sacrifices to Demeter). This was a prudent three days after 10.5.8 was released, though the combo updater was downloaded two days after release.

Lately (in the Holocene epoch), the Mac OS X installers have come in ‘flat’ package format (i.e, “smooth and even; without marked lumps or indentations” or “lacking interest or emotion; dull and lifeless”). This makes them opaque to the casual observer. Fortunately, I am Klondike Kat. The installer .pkg file is actually a xar archive that can be read and extracted with /usr/bin/xar. To extract the package contents into the current working directory:

$ xar -xf /Volumes/Mac\ OS\ X\ Update\ Combined/MacOSXUpdCombo10.5.8.pkg

This gives us an ‘old-style’ (Pleistocene) .pkg file whose contents we can view in Finder. The package contains, among other things, a Payload. Be careful not to ignite it, otherwise you may require intensive care, if not AppleCare. The Payload is a gzip archive, so I slapped a .gz extension on the file and gunzip‘ed it. After extracting that archive, you’re left with … yet another archive. (Apparently the Matryoshka method of software distribution.) The new Payload can be read and extracted with /bin/pax:

$ pax -f Payload -r *SystemVersion.plist

The SystemVersion.plist from my original installer is for build 9L30, but the one from the installer I downloaded today is for build 9L31a. Thus, we have to conclude that Apple ‘hot swapped’ Mac OS X 10.5.8. That is, they switched Mac OS X builds after release without bumping the version number.

Why Apple did this remains a mystery. Usually software developers do it when they discover an issue shortly after release but don’t want to go to the trouble of making a public announcement of a new version. What was the issue, and do those of us who have build 9L30 installed still suffer from the issue? For the answer to those questions, you’ll have to read the release notes. ;-)

It’s over

Thursday, August 6th, 2009

I figured I’d cruise, at least through the Spring. However, the wheels on the bus go round and round.

rdar://problem/7125338

I am still master of my domain. Although I need to renew before it expires in three weeks.

dSYM in your bundle or just happy to see me

Tuesday, January 20th, 2009

It’s been a while since I posted last. Rest assured that I did survive the Y2K9 disaster, though not unscathed. Since bloggers and other entertainers — such as Brian Williams — are required by law to offer a retrospective at the end of a year, I’ve been scanning the Top 10 lists of Top 10 lists of things that we have gained and lost in 2008. Next to our collective sanity, the most significant loss of the year was STABS. Actually, it wasn’t so much lost as deprecated. This means that we can’t expect any new features (or bugs!), and support for STABS debugging symbols may disappear in some future operating system, say, Windows -400. (I assume that the countdown of version numbers from 95 to 7 is intended to accurately represent the software’s regression.) In the transition from STABS to DWARF, it was thought (by the people who matter, viz., me) that we also lost the ability to ship debugging symbols with our apps. Luckily, it was discovered (again, by the people who matter) that we did not lose this ability.

Developers sometimes need to give users a debug version of an application. For example, a user may be experiencing an exception or crash that the developer cannot reproduce. Including debugging symbols with the app allows the reports to be fully symbolized. With STABS, the symbols reside within the app’s executable, so shipping them is trivial. The DWARF with dSYM format, on the other hand, puts the debugging symbols in a separate file. (To be accurate, a separate file within a separate bundle, but we’ll ignore that fact for this sentence.) By default, Xcode creates MyApp.app.dSYM in the same folder as MyApp.app, and indeed, Leopard’s crash reporter can locate MyApp.app.dSYM in the same folder as MyApp.app regardless of which folder they’re in on disk. Theoretically, then, you could have the user put a .dSYM in the same folder as the app. However, making the user do this would be, in a word, lame. In two words, pretty lame. Moreover, it doesn’t work at all on Tiger. Pretty, pretty lame.

When I face an insoluble problem, my tendency is to step back and get philosophical. Why do I exist? Why does the universe hate me? Who was the real Darrin? More to the point: what is an app? Essentially, an app is a command-line tool in a box with a pretty bow. (Another iSweater, just what I needed!) An app’s main executable file is located in the directory Contents/MacOS of the .app bundle. You can even launch an app from the command line, e.g.,

/Applications/Safari.app/Contents/MacOS/Safari

assuming that you haven’t deleted Safari for security reasons. So how does this information help us? It doesn’t — I’m just killing time here. However, it’s worth noting that if you build the Release configuration of a command-line tool project, Xcode by default creates MyTool.dSYM in the same folder as MyTool. In both Leopard and Tiger, the crash reporter can locate the .dSYM there. Thus, you would expect that the crash reporter can also locate MyApp.app/Contents/MacOS/MyApp.dSYM when your app crashes. And you would be right! (Of course, you would expect this because I just told you, whereas originally you would have expected to try a bunch of stuff and fail, like putting MyApp.app.dSYM in MyApp.app/Contents/MacOS.)

The beauty of this technique is that it works not only for the app’s main executable but also for other embedded executables such dynamic libraries and frameworks. When a crash occurs involving MyFramework.framework, the crash reporter will find

MyApp.app/Contents/Frameworks/MyFramework.framework/Versions/A/MyFramework.dSYM

You can build the framework in a separate Xcode project and copy the product along with its embedded dSYM into your app’s bundle, and the symbols will be found at crashtime. (That’s runtime with a bang.) In Tiger, the line numbers of the source code files can sometimes be a little off in the crash reports; this may be due to bugs in the handling of stripped binaries by atos, which I mentioned in my earlier post.

Now that we know where to put the debugging symbols in the app bundle, how do we get them there? Manual copying is unthinkable (like giving David Pogue a copy of OS X GM before ADC members, or putting Leon Panetta in charge of the CIA). If your entire build process is not automated, you should give up software development immediately and look for another career; I recommend professional ice dancing. You could write a shell script to copy MyApp.app.dSYM from the build directory, but that’s only slightly less annoying than having your users copy it to /Applications, because it’s something that Xcode should do itself.

Fortunately, the Xcode build setting reference tells us how to configure this. Or so one would think. Well, at least the relevant build settings are found in the environment variables … after you’ve written your shell script. The Xcode build transcript normally doesn’t show environment variables, but you can add a run script build phase to your target and check the option “Show environment variables in build log”. The environment variables reveal the default values for DWARF_DSYM_FOLDER_PATH and DWARF_DSYM_FILE_NAME, which Xcode uses in creating the dSYM file. Although you won’t find them in the target’s list of build settings, you can create them yourself in the User-Defined section. To embed the dSYM within the app bundle, just set DWARF_DSYM_FOLDER_PATH to $(CONFIGURATION_BUILD_DIR)/$(EXECUTABLE_FOLDER_PATH) and DWARF_DSYM_FILE_NAME to $(EXECUTABLE_NAME).dSYM. These settings should work for both apps and frameworks.

My beard has grown longer over the course of this post, and my knees are starting to ache, so it’s time to wrap it up, tip my hat to the new year, and meet the new boss.

Review of PGP boot disk encryption

Sunday, November 23rd, 2008

This is my first official software review. I normally don’t review software other than my own — Radioshift, five thumbs up, buy now! — because there’s no profit in it (like US auto makers). However, Dave Dribin asked me to do it, and apparently Dave gets whatever he asks for.

PGP Whole Disk Encryption introduced pre-boot authentication for Intel Macs in version 9.9. Pre-boot authentication allows you to encrypt your Mac’s entire internal hard drive. I wrote a form of whole disk encryption myself in Knox, but that was for non-boot disks. Prior to installing PGP 9.9, I had been using Apple’s built-in FileVault to encrypt the home directory of my MacBook Pro. I became interested in whole disk encryption for the laptop after I discovered that neither third-party developers nor Apple itself could be trusted not to write personal data outside your home directory.

This review is not intended to be comprehensive, because again, I’m not being paid for it … though if a certain corp whose name is a certain acronym would send a certain something my way, I would certainly be appreciative, wink, wink, nudge, nudge, say no more. Before you charge the software to Mr. Underhill’s American Express card (want the number?), I highly recommend that you study the user guide for important caveats. My aim is simply to describe my experience and to pass along some undocumented tips I picked up along the way.

I purchased Whole Disk Encryption for Mac, affectionately known as WDE4M, from PGP’s online store for 119 US Dollars (more than a bread box, less than a nano), and I received my license key by email within 10 minutes, so no problems there. It took slightly longer to encrypt my boot disk. The entire process required around 8 hours for the MBP’s 200 GB internal HD. (Actually, according to Mac OS X, it’s 186.3 GB. These are sometimes given the label GiB, which stands for Grrrr, ithoughtihadmore Bytes.) Obviously, you’ll want to let it to run overnight, unless you need a break from watching your grass grow.

In reviewing WDE4M, the first concern is security. When you boot your Mac from the internal drive, you get the PGP login screen. At this point, the Mac OS X volume has not yet been mounted. Until you enter your password at the PGP login screen, the entire boot volume remains encrypted. As long as you choose a good password (mine is Joshua), all of your data is safe. Note that it is still possible to boot your Mac from a different disk such as a DVD or an external hard drive. It’s even possible to boot into Firewire target disk mode (assuming you have a Firewire port: ha, ha!). However, you won’t be able to mount the Mac OS X volume on the internal drive, because without PGP running, you have nothing more than a partition full of encrypted bytes. Indeed, PGP modifies the partition table of your disk to add its special boot partition, so I would recommend starting with a single volume of data. I previously had multiple partitions and volumes on the MBP, but I found that to be a PITA regardless of PGP.

After you authenticate successfully at the PGP screen, the computer boots normally into Mac OS X. It is crucial to realize that when you’re booted into Mac OS X, your data is vulnerable. PGP will decrypt on the fly any bytes that the OS asks for. Thus, if someone steals your laptop while it’s running OS X, you’re screwed. You can try logging out or setting a screensaver password, but those types of protection can often be defeated. The only way to guarantee safety is to shut down or reboot. Thankfully, WDE4M protects against so-called ‘cold boot’ attacks (unlike FileVault).

The next issue for WDE4M beyond security is performance. On my MBP with a 2.33 GHz Intel Core 2 Duo and 2 GB RAM, I’ve found performance to be a non-issue. Admittedly, I’ve never done speed tests, but I don’t perceive my system to be sluggish or slower from PGP WDE. It seems as ZippyTM as ever. I’ve heard from some sources (e.g., the shoe shine guy) that PGP’s encryption / decryption is much faster than FileVault’s. The only operations that seem a little slow are copying extremely large, multi-GB files from another disk; the entire contents of these files must be encrypted as they’re copied onto the internal drive.

The final issue I’ll discuss is backups. If you care about your data, you must back it up, otherwise you will lose it at some point. If your data is important enough to protect with WDE4M, it’s important enough to back up. (Note that I made two full backups of my internal drive before attempting to encrypt it. I also downloaded my brain into an android.) No backup strategy is perfect for everyone, so we must each follow one that fits our needs. For example, the majority of computer users follow the strategy that experts term ‘Divine Intervention’. I had to experiment quite a bit before I found something that worked for me: in the end I turned to good ol’ dd.

My procedure for backing up my PGP-encrypted internal hard drive is simple. Even a caveman could do it. (Yes, Unix has been around that long.) First I mount an external backup drive that has enough free space to fit my entire internal drive. Then I boot into the Mac OS X installer: this can be done from a partition on the external drive, from a DVD, or from a USB stick. A Mac OS X installer volume is not required to perform the backup — you could use another Mac, for example — but I use an installer so that I can boot from the MBP and take advantage of its Firewire 800 port. Finally I launch Terminal and enter the following:

dd if=/dev/disk0 of=/Volumes/backups/disk0.dmg

Running dd takes 5 to 7 hours back up the MBP’s 186 GiB HD to a FireWire 800 external HD. I might be able to expedite the process by tweaking the bs operand of dd, but I’m running the backup overnight anyway, so I favor simplicity and reliability over speed. Afterward, I have a byte-for-byte backup of my entire internal drive. Any machine running PGP can mount the dmg with the correct password, so the backup is suitable for file-based restoration. A machine without PGP installed, in contrast, will fail to mount the dmg, finding no mountable file systems, because the entire file system is encrypted.

From a security standpoint, a byte-for-byte backup is not ideal, because it has the same encryption key as the original. Once you start modifying files on your internal drive again, it’s conceivable that a diff between the backup and original could reveal something interesting. However, few people in the world have any hope of success in extracting readable information through such an investigation, certainly not the casual thief, and of course backing up your files unencrypted would be infinitely worse! I’m not trying to keep any state secrets (my WMD is curled up sleeping on his cat bed), but if you’re the paranoid type — and my hidden video cameras show me that you are — you should be able to encrypt your backup drive with a different key before you create the dmg with dd. Indeed, you could create one big encrypted dmg with Disk Utility and put the backup dmg inside it. I haven’t tried this myself, so I’d be interested to hear whether it’s viable. Anyway, this Russian doll approach would provide ample protection if your data were stolen by the Russian mafia, or if you were a member of it.

In the event of catastrophic data loss, e.g., my laptop is swallowed by a whale, I can use the backup to easily transform some other disk into a bootable clone of the laptop:

dd if=/Volumes/backups/disk0.dmg of=/dev/disk1

If you have an external drive the same size or slightly larger than your internal drive, you can skip the dmg and create a bootable clone directly:

dd if=/dev/disk0 of=/dev/disk1

The disadvantage of this procedure is that any extra space on the backup drive would be unusable. I have a few 500 GB (465 GiB, sigh) external HD’s, so it makes more sense for me to save multiple backups on each drive.

You can boot a clone of your PGP-encrypted drive from another machine regardless of whether the machine has PGP installed on its internal drive. However, it may take a couple of spontaneous reboots before you can login to Mac OS X, much like a software update, so you need to be patient. (Perhaps it’s updating the boot cache?) Also, booting the clone from the original machine is to be avoided. As a test of my backup procedure, I cloned my MBP to an external drive and then booted the MBP from the clone. The MBP did successfully boot from the external drive, and I was able to login to Mac OS X, but I was surprised to find that the Mac OS X volume was mounted from the internal rather than the external drive. This bizarre behavior puzzled me until I read Secrets of the GPT, which I already mentioned in my last post. The technical note warns, “Be careful when doing a block-for-block copy of a GPT disk. The GUID in the partition table header that identifies the disk (and the GUIDs in each partition entry) are meant to be globally unique, and Apple’s system software relies on this feature.” If you do what I did, “the computer might boot from either the original or the copy in an unpredictable fashion (perhaps toggling from boot to boot).” Oops! That reminds me of the time I got mount to show two volumes with the same BSD name … but that’s a tale for another day.

WDE4M comes with PGP Desktop, which has a number of useful features such as handling public-private key-pairs and allowing encryption of AOL Instant Message sessions between PGP users. PGP Desktop can automatically encrypt email as well, but one thing to look out for is that it attempts this by default. I kept getting “Invalid Authentication Certificate” warnings in Mail.app, and I initially blamed this on Leopard, because the warning window did not indicate that it was from PGP, and I had just installed Leopard prior to installing PGP. You can turn off the email encryption feature in the Messaging Security preferences of PGP.app. Hopefully PGP will put its name on the warning window in the next software update to PGP 9.9, so that it’s clear to the user where the warning is coming from.

Overall, in summary and conclusion, to wrap it all up, finally: I find WDE4M to be a well-engineered product, it does what it’s supposed to do, viz., protect all of your data, I have no regrets about buying it, and I have no reservations about encouraging other people to buy it too.

P.S. If you like WDE4M from PGP, you might also enjoy Airfoil from Rogue Amoeba. Nudge, nudge, say no more.

SECURITY ALERT: Mac OS X 10.5.2 subverts FileVault

Saturday, May 10th, 2008

I apologize for not posting this earlier. I’ve been extremely busy lately, and I had discussed the issue with someone who said that he or she (he) was going to post about it (but hasn’t).

The security alert is for FileVault users running Mac OS X 10.5.2. You thought that FileVault encrypted your personal data, right? Wrong! In Mac OS X 10.5.2, the location of the CFNetwork caches was moved from ~/Library/Caches, which is within your home directory and thus encrypted, to /private/var/folders, which is not within your home directory and thus not encrypted. This means that anyone with physical access to your hard drive could, for example, determine which URLs you’ve loaded, even if your computer is shut down.

Note that nothing about this change was mentioned in the Mac OS X 10.5.2 release notes.

For further reference on this issue, see the thread that began in the WebKit SDK mailing list and was moved by me to the Macintosh Network Programming mailing list. Thanks to Eric Long, who noticed the change in the first place, and Ron Hunsinger, who performed testing that I was too lazy to do. (In my defense, I haven’t yet migrated my FileVault account from Tiger to Leopard, so the issue doesn’t affect me directly.)

Stabs is deprecated

Sunday, March 9th, 2008

This post is dedicated to E. Gary Gygax, the second greatest corrupter of youth in history. It’s about D&D, that is, DWARF and dSYM. As of 2008-02-27, the STABS debugging symbols format has been deprecated by Apple. The default value for the DEBUG_INFORMATION_FORMAT build setting in Xcode projects had been stabs, but now it’s time to move on. (I’m talking to you, Justin Long.) Our other options are dwarf or dwarf-with-dsym. Also cake or death.

With STABS, you could build the release version of your app with debugging symbols, make a copy of the executable MyApp.app/Contents/MacOS/MyApp to keep, strip the executable for shipping, and then use the unstripped executable for symbolizing crash reports by giving a space-separated list of stack trace addresses to the command-line tool atos. Unfortunately, atos cannot currently serve this purpose with DWARF. Unlike STABS, DWARF does not include the debugging symbols in the executable itself but merely includes references to the intermediate object files, which do contain debugging symbols. You can usually find these .o files in a sub-directory of the build/MyApp.build directory. If you delete the object files after building with dwarf, you won’t be able to step through your app’s code. (With stabs, the object files are refuse.) You also won’t be able to step through the code if you strip debugging symbols from your app, even if you keep the object files, because the references to the object files will be gone from the executable.

To avoid losing the debugging symbols for your app after stripping, you want to use the option dwarf-with-dsym. The DWARF with dSYM option performs an additional step beyond ordinary DWARF: it creates a separate MyApp.app.dSYM file that contains all of the debugging symbols for your app. In fact, the DWARF with dSYM option allows you to step through your code regardless of whether the executable is stripped! This is possible because gdb will look for the .dSYM file in the same directory as your app. It doesn’t need to know the name or location of the object files. If you don’t strip debugging symbols, you can use either the .o files or the .dSYM file for debugging, but for the local debug build of your app there’s no point in using dSYM, since that would just prolong your build time. You have better things to do than wait for builds, such as writing comments on Slashdot.

The trouble with atos is that it does not reliably find debugging information in .dSYM files for stripped executables. Although Apple’s documentation (as of 2007-04-02) says, “If you’re using DWARF dSYM files, you must be using the version of atos included in Xcode 3 (Mac OS X version 10.5)”, Apple’s engineers say, “The underlying framework that atos uses doesn’t support loading symbol names from dSYM files in Leopard.” In my testing, however, there doesn’t seem to be a difference between Leopard and Tiger, at least not with Xcode 2.5 on Tiger. On both Leopard and Tiger, atos successfully loads symbol names from .dSYM files (I deleted the .o files) for unstripped executables. For stripped executables, in contrast, atos frequently fails to load the symbol names, or even gives inaccurate results.

The CrashReporter Technical Note suggests loading your app and its .dSYM in gdb to translate stack trace addresses from crash reports. That’s like having to start your car in order to read the odometer. (Oh wait, I have to do that, Nissan!) An alternative method is the command-line tool dwarfdump. It requires only the .dSYM file, not a copy of your app, and its --lookup option will do the same job as gdb without the overhead.

Please note that by breathing, blinking, or moving at all, even to command-w this page, you thereby register your agreement not to disclose or discuss this information anywhere with anyone at any time, no matter the duress, torture, or water-boarding you may undergo to extract it. This agreement holds despite the fact that the information is publicly available on the internet for every person in the world to read. Failure to uphold this agreement will result in multiple, painful cat scratches, in certain cases leading to cat scratch fever.

FUD from Rixstep: NSDocumentController in Leopard

Saturday, February 2nd, 2008

At the risk of provoking their ire and being branded a moron, I wish to dispute a claim that has been made several times by Rixstep about a supposed security vulnerability in Leopard’s NSDocumentController: Cocoa’s document controller overrides file system permissions without authentication. As far as I can tell, this claim is false.

I should note at the outset that I have nothing personal against Rixstep. Their syndicated feed has long been among my (numerous) favorites that I subscribe to in Vienna, as you can see by downloading the exported opml from my blog. I welcome legitimate criticism of Apple, and I have found Rixstep’s articles entertaining in the past, though of late they have become overly juvenile. As far as the whole ‘Cross-Platform Bait & Switch’ incident is concerned, I can’t comment on the legality of reproducing the quotations, because I’m not a lawyer, but I don’t think that Rixstep can be accused of misquoting or quoting out of context in this case.

Anyway, I’ve included the full text below of the email I sent to Rixstep a month ago. Retraction was perhaps too much to hope for, but I thought they would at least stop making the false claim after reading my email. They haven’t, which is why I’m now ‘going public’. (This should knock Britney off the front page.)

Hi. I read your article at <http://rixstep.com/2/1/20071227,00.shtml>, and it got me a little worried, so I did some testing. My results are that NSDocumentController in Leopard does not allow you to override Unix permissions. In fact, NSDocumentController in Leopard is more strict than the Unix permissions: it won’t allow you to save a writable document when you don’t have write permissions for the enclosing directory.

You are correct that saving a document always deletes the existing file and creates a new one with a different inode, and you are also correct that some of the user-visible NSDocumentController warning messages are misleading. However, if you don’t have write permissions for a directory, then NSDocumentController won’t let you delete or add a file, and even if you do have write permissions for the directory, NSDocumentController won’t let you delete someone else’s file or add a file with the same name if the sticky bit is set.

Thus, I believe that the ramifications for system administrators are negligible. With certain Unix permissions, it has always been possible to delete a file in a directory and replace it with a new file of the same name. Leopard has not changed that at all, so system administrators should take the same precautions against this scenario that they have always taken.

-Jeff

You don’t have to take my conclusions for granted, though. In a matter of minutes, you should be able to throw together a bare-bones document-based application suitable for testing the behavior yourself.

I have no desire to discourage criticism of Apple or Leopard. There are major problems in Leopard, and I don’t yet find it acceptable for use as my primary operating system. I just think that this spurious security issue obscures the real issue of whether the new Leopard NSDocumentController behavior is desirable.

P.S. If you really want to talk about a waste of precious disk space, /Applications/Mail.app/Contents/Resources is an astounding 277 MB on Leopard.