Articles index

Better disassembly on macOS Big Sur

June 27 2020 by Jeff Johnson
Support this blog: Link Unshortener, StopTheMadness, Underpass, PayPal.Me

This is the third part to what is now a three part series on disassembling system libraries on macOS 11 Big Sur. Part 1 explains how to extract the system libraries from the dyld shared cache, and Part 2 explains some difficulties in disassembling Objective-C in those extracted libraries. Part 3 will provide a solution to those difficulties!

Static disassembly tools such as otool and llvm-objdump have not been updated to handle the dyld shared cache on Big Sur. However, one tool that does handle it is lldb, the debugger. Thus, you'd think a simple solution to disassembling a system library on BS is to load the library in lldb, do image dump sections to find the addresses of the __text section, and then do disassemble --start-address [start] --end-address [end] to disassemble the library. Alas, it's not that simple! Unfortunately, the lldb disassember stops prematurely when it hits an opcode that it doesn't understand. (Why must all the Apple tools be so bad?) With otool you see output like this:

00007fff235f68d9	.byte 0xfe #bad opcode

Fortunately, I thought of a workaround (AKA terrible hack) for this. I created a little command-line tool that gets the output of /usr/bin/nm -n [extracted library] -s __TEXT __text and transforms it into a series of lldb dissasemble commands such as di -n '[symbol]'. These lldb commands will allow us to disassemble every function and method in the library. I call my tool bsnm, and here's the source code in all its glory, which you are free to use under my standard SHAG software license (search my web site for the terms).

// Copyright 2020 Jeff Johnson. All rights reserved.

#import <Foundation/Foundation.h>

int main(int argc, const char *argv[]) {
	@autoreleasepool {
		if (argc != 2) {
			printf("Usage: %s <object file>\n", argv[0]);
			return EXIT_FAILURE;
		}
		NSString *path = [NSString stringWithUTF8String:argv[1]];
		if (path == nil) {
			printf("invalid path: %s\n", argv[1]);
			return EXIT_FAILURE;
		}
		
		NSTask *task = [[NSTask alloc] init];
		[task setLaunchPath:@"/usr/bin/nm"];
		[task setArguments:@[@"-n", path, @"-s", @"__TEXT", @"__text"]];
		NSPipe *pipe = [NSPipe pipe];
		[task setStandardOutput:pipe];
		NSFileHandle *fileHandle = [pipe fileHandleForReading];
		NSError *error = nil;
		if (![task launchAndReturnError:&error]) {
			NSLog(@"launch error: %@", error);
			return EXIT_FAILURE;
		}
		NSData *data = [fileHandle readDataToEndOfFile];
		if ([data length] == 0) {
			NSLog(@"no output");
			return EXIT_FAILURE;
		}
		NSString *string = [[NSString alloc] initWithData:data encoding:NSMacOSRomanStringEncoding];
		if (string == nil) {
			NSLog(@"not NSMacOSRomanStringEncoding: %@", data);
			return EXIT_FAILURE;
		}
		
		[string enumerateLinesUsingBlock:^(NSString *line, BOOL *stop) {
			if (![line hasPrefix:@"00007fff"]) {
				return;
			}
			if ([line length] > 20) {
				NSUInteger symbolIndex = 19;
				NSString *type = [line substringWithRange:NSMakeRange(16, 3)];
				if ([type isEqualToString:@" T "] || [type isEqualToString:@" t "]) {
					NSString *symbol = [line substringFromIndex:symbolIndex];
					if ([symbol hasPrefix:@"_"])
						symbol = [symbol substringFromIndex:1];
					printf("di -n '%s'\n", [symbol UTF8String]);
					return;
				}
			}
			NSLog(@"Unexpected line: %@", line);
			exit(EXIT_FAILURE);
		}];
	}
	return EXIT_SUCCESS;
}

You can pipe the output of bsnm to a text file for convenience. Then create a test project that loads the relevant system library (not the extracted library). This is easy to do with dlopen. For example:

void *handle = dlopen("/System/Library/Frameworks/AppKit.framework/AppKit", RTLD_NOW);

Although there's no executable at that path, just a link, Big Sur knows how to load the library from the dyld shared cache. Run your test project in lldb, and break after loading the library. You'll want to do this in Terminal rather than in Xcode, because the Xcode debugger console doesn't handle pasted newlines correctly. Finally, copy all the previously generated lldb command from the text file, paste them into lldb, and let lldb do its thing. If it's a large library, this may take a while!

I hope that my little hack helps you to disassemble system libraries on Big Sur. It's a bid tedious, but it mostly works, and you only have to do it once for each library you're interested in. One known issue with the bsnm tool is that the lldb disassemble command don't work in a few cases, such as for Objective-C block invocations and .cold. paths generated by LLVM hot cold splitting. I suspect that the leading "_" character shouldn't be trimmed from these symbols, so perhaps we can fix up bsnm to handle these special cases too. Let the BS be with you.

Addendum

My suspicion was wrong. The problem wasn't the leading "_" character getting trimmed in the cases where lldb can't disassemble. This actually appears to be a bug in the debugger! When C and Objective-C symbols represent block invocations or cold paths, lldb is unable to locate the symbols. This bug doesn't occur with mangled C++ symbols, which lldb handles fine. So at the moment, I don't think I can do anything to fix these cases automatically. I can't disassemble them by address either, because the address of the symbol differs in the dyld shared cache and the extracted library.

Support this blog: Link Unshortener, StopTheMadness, Underpass, PayPal.Me

Articles index