Recording Artist: Apple

Showing posts with label Apple. Show all posts

Sunday, December 02, 2007

The Case Against Insensitivity

One of the most controversial parts of my earlier post, Don't Be a ZFS Hater, was when I mentioned off-handedly in the comments that I don't like case-insensitivity in filesystems.

Boy, did that spur a storm of replies.

I resolved to not pollute the ZFS discussion with a discussion of case-insensitivity and promised to make a separate blog post about it. It took a while, but this is that post. I blame a busy work schedule and an even busier travel schedule. (Recently in the span of two weeks I was in California, Ohio, London, Liverpool, London, Bristol, London, Amsterdam, London, then back to Ohio. Phew!)

Here's Why Case-Insensitive Filesystems Are Bad

I've worked in and around filesystems for most of my career; if not in the filesystem itself then usually a layer just above or below it. I'm speaking from experience when I tell you:

Case-insensitivity is a bad idea in filesystems.

And here's why:

It's poorly defined.
Every filesystem does it differently.
Case-insensitivity is a layering violation.
Case-insensitivity forces layering violations upon other code.
Case-insensitivity is contagious.
Case-insensitivity adds complexity and provides no actual benefit.

I'll expand on each of these below.

It's poorly defined

When I say "case-insensitive", what does that mean to you?

If you only speak one language and that language is English, it probably seems perfectly reasonable: map the letter a to A, b to B, and so on through z to Z. There, you're done. What was so hard about that?

But that's ASCII thinking; the world left that behind a long time ago. Modern systems are expected to deal with case differences in all sorts of languages. Instead of a simple 26-letter transformation, "case insensitivity" really means handling all the other alphabets too.

The problem with doing that, however, is that it brings language and orthography into the picture. And human languages are inherently vague, large, messy, and constantly evolving.

Can you make a strict definition of "case insensitivity" without any hand-waving?

One way to do it is with an equivalence table: start listing all the characters that are equal to other characters. We can go through all the variants of Latin alphabets, including a huge list of accents: acute, grave, circumflex, umlaut, tilde, cedilla, macron, breve, dot, ring, ogonek, hacek, and bar. Don't forget to find all the special ligatures and other letters, too, such as Æ vs æ and Ø vs ø.

Okay, our table is pretty big so far. Now let's start adding in other alphabets with case: Greek, Armenian, and the Cyrillic alphabets. And don't forget the more obscure ones, like Coptic. Phew. It's getting pretty big.

Did we miss any? Well, for any given version of the Unicode standard it's always possible to enumerate all letters, so it's certainly possible to do all the legwork and prove that we've got all the case mappings for, say, Unicode 5.0.0 which is the latest at the time of this writing. But Unicode is an evolving standard and new characters are added frequently. Every time a new script with case is added we'll need to update our table.

There are also some other hard questions for case insensitivity:

Digraph characters may have three equivalent mappings, depending on how they are being written: all-lowercase, all-uppercase, or title-case. (For example: ǳ, Ǳ, or ǲ.) But this breaks some case-mapping tables which didn't anticipate the need for an N-way equivalence.
The German letter ß is considered equal to lowercase ss. Should "Straße" and "STRASSE" be considered equivalent? They are in German. But this breaks some case-mapping tables which didn't anticipate the need for an N-to-M character translation (1:2, in this case).
Capital letters can significantly alter the meaning of a word or phrase. In German, capital letters indicate nouns, so the word Essen means "food", while the word essen means "to eat". We make similar distinctions in English between proper nouns and regular nouns: God vs god, China vs china, Turkey vs turkey, and so on. Should "essen" and "Essen", or "china" and "China" really be considered equivalent?
Some Hebrew letters use different forms when at the end of a word, such as פ vs ף, or נ vs ן. Are these equivalent?
In Georgian, people recently experimented with using an obsolete alphabet called Asomtavruli to reintroduce capital letters to the written language. What if this had caught on?
What about any future characters which are not present in the current version of the Unicode standard?

Case is a concept that is built into written languages. And human language is inherently messy. This means that case-insensitivity is always going to be poorly defined, no matter how hard we try.

Every filesystem does it differently

Unfortunately, filesystems can't engage in hand-waving. Filesystem data must be persistent and forward-compatible. People expect that the data they wrote to a disk last year should still be readable this year, even if they've had an operating system upgrade.

That's a perfectly reasonable expectation. But it means that the on-disk filesystem specification needs to freeze and stop changing when it's released to the world.

Because our notion of what exactly "case-insensitive" means has changed over the past twenty years, however, we've seen a number of different methods of case-insensitivity emerge.

Here are a handful of the most popular case-insensitive filesystems and how they handle case-mapping:

FAT-32: ASCII upper- and lower-case letters, but a-z and A-Z are considered identical. Also variable IBM code pages in high ASCII.
HFS: ASCII upper- and lower-case letters, but a-z and A-Z are considered identical. Also variable Mac encodings in high ASCII.
NTFS: Case-insensitive in different ways depending on the version of Windows that created the volume.
HFS+: Case-insensitive with a mapping table which was frozen circa 1996, and thus lacks case mappings for any newer characters.

None of these — except for NTFS created by Vista — are actually up-to-date with the current Unicode specification. That's because they all predate it. Similarly, if a new filesystem were to introduce case-insensitivity today, it would be locked into, say, Unicode 5.0.0's case mappings. And that would be all well and good until Unicode 5.1.0 came along.

The history of filesystems is littered with broken historical case mappings like a trail of tears.

Case-insensitivity is a layering violation

When people argue for case-insensitivity in the filesystem, they almost always give user interface reasons for it. (The only other arguments I've seen are based on contagion, which I'll talk about in a moment.) Here is the canonical example:

My Aunt Tillie doesn't know the difference between letter.txt and Letter.txt. The filesystem should help her out.

But in fact this is a UI problem. The problem relates to the display and management of information, not the storage of this information.

Don't believe me?

When any application displays items in a window, who sorts them case-insensitively? The filesystem? No! The application does it.
When you type-select, typing b-a-b-y to select the folder "Baby Pictures" in an application, who does the case-insensitive mapping of the letters you type to the files you select? The filesystem? No! The application again.
When you save or copy files, who does the case-insensitive test to warn you if you're creating "file.txt" when "File.txt" already exists? The filesystem? Yes!

Why does the third question have a different answer than the rest?

And we've already talked about how filesystems are chronically out-of-date with their case mappings. If your aunt is a Turkish Mac user, for example, she's probably going to notice that the behavior of the third one is different for no good reason. Why are you confusing your Aunt Tülay?

One last point was summarized nicely by Mike Ash in the comments of Don't Be a ZFS Hater. I'll just quote him wholesale here:

Yes, Aunt Tillie will think that "Muffin Recipe.rtf" and "muffin recipe.rtf" ought to be the same file. But you know what? She'll also think that "Muffin Recipe .rtf" and "Recipe for Muffins.rtf" and "Mufin Recipe.txt" ought to be the same file too.

Users already don't generally understand how the OS decides whether two files are the same or not. Trying to alleviate this problem by mapping names with different case to the same file solves only 1% of the problem and just isn't worth the effort.

I agree completely.

Case-insensitivity forces layering violations upon other code

All too often, pieces of code around the system are required to hard-code knowledge about case-insensitive filesystem behavior. Here are a few examples off the top of my head:

Collision prediction. An application may need to know if two files would conflict before it actually writes either of them to disk. If you are writing an application where a user creates a group of documents — a web page editor, perhaps — you may need to know when banana.jpg and BANANA.JPG will conflict.

The most common way that programmers solve this is by hard-coding some knowledge about the case-insensitivity of the filesystem in their code. That's a classic layering violation.
Filename hashing. If you are writing code to hash strings that are filenames, you probably want equivalent paths to generate the same hash. But it's impossible to know which files are equivalent unless you know the filesystem's rules for case-mapping.

Again, the most common solution is a layering violation. You either hard-code some knowledge about the case-insensitivity tables, or you hard-code some knowledge about your input data. (For example, you may just require that you'll never, never, ever have multiple access paths for the same file in your input data. Like all layering violations, that might work wonderfully for a while ... right up until the day that it fails miserably.)

I'm sure there are more examples out there.

Case-insensitivity is contagious

This is the worst part. It's all too easy to accidentally introduce a dependence on case-insensitivity: just use an incorrect path with bad case.

The moment somebody creates an application or other system that inadvertently depends on case-insensitivity, it forces people to use a case-insensitive filesystem if they want to use that app or system. And that's one of the major reasons why case-insensitivity has stuck around — because it's historically been very difficult to get rid of.

I've seen this happen with:

Source code. Some bozo writes #include "utils.h" when the file is named Utils.h. Sounds innocent enough, until you find that it's repeated dozens of times across hundreds of files. Now that project can only ever be compiled on a case-insensitive filesystem.
Game assets. A game tries to load lipsync.dat instead of LIPSYNC.DAT. Without knowing it, the artist or developer has accidentally locked that game so that it can only run on a case-insensitive filesystem. (This causes real, constant problems in game pipelines; teams create and test their games on case-insensitive NTFS and don't notice such problems until it's burned to a case-sensitive UDF filesystem on DVD or Blu-Ray.)
Application libraries. DLLs and shared library references are sometimes generated by a build script which uses the wrong case. When that happens, the application may simply fail to launch from a case-sensitive filesystem.
Miscellaneous data files. Sometimes an application will appear to run on a case-sensitive filesystem but some feature will fail to work because it fails to load a critical data file: the spell-checking dictionary, a required font, a nib, you name it.

Happily, since Mac OS X shipped in 2001, Apple has been busy solving its own problems with case-insensitivity and encouraging its developers to test with case-sensitive filesystems. Two important initiatives in this direction have been NFS home directories and case-sensitive HFSX.

The upshot of it is that Mac OS X is actually very friendly to case-sensitive disks these days; very little that's bad happens when you use case-sensitive HFSX today.

Case-insensitivity adds complexity with no actual benefit

I'm going to make an assertion here:

ONE HUNDRED PERCENT of the path lookups happening on your Mac right now are made with correct case.

Think about that for a moment.

First off, you may think this contradicts the point I just made in the previous section. Nope; I'm simply rounding. The actual figure is something like 99.999%, and I'd probably get tired of typing 9's before I actually approached the real number. There are infinitesimally few path accesses made with incorrect case compared to the ones that are made with the proper case.

Modern computers make hundreds of filesystem accesses per second. As I type this single sentence in MarsEdit on Mac OS X 10.4.11, my computer has made 3692 filesystem accesses by path. (Yes, really. MarsEdit's "Preview" window is invoking Perl to run Markdown, which loads a handful of modules, and then WebKit re-renders the page. That's a lot of it, but meanwhile there's background activity from Mail, Activity Monitor, iChat, SystemUIServer, iCalAlarmScheduler, AirPort Base Station Agent, Radioshift, NetNewsWire, Twitterrific, and Safari.)

Under Mac OS X you can measure it yourself with this command in Terminal:

sudo fs_usage -f filesys | grep / > /tmp/accesses.txt

The vast majority of file accesses are made with paths that were returned from the filesystem itself: some bit of code read the contents of a directory, and passed the results on to another bit of code, which eventually decided to access one of those files. So most of the time the filesystem is getting back the paths that it has returned earlier. Very very few accesses are made with paths that come directly from an error-prone human, which is why essentially 100% of filesystem accesses are made with correct case.

But if essentially all filesystem accesses are made with the correct case to begin with, why do we even have case-insensitivity at all?

We've already discussed the problems of contagion, which is a circular justification: we have to do it because someone else did it first. We've also discussed UI decisions being incorrectly implemented in the bottommost layer of the operating system. Other than those two, what good is it?

I don't have an answer to that. For the life of me I can't come up with any reason to justify case-insensitive filesystems from a pure design standpoint. That leads me to my closing argument, which is...

A thought experiment

Suppose case-insensitive filesystems had never been invented. You're the leader of a team of engineers in charge of XYZZYFS, the next big thing in filesystems. One day you tell the other people who work on it:

"Hey! I've got this great idea! It's called case-insensitivity. We'll take every path that comes into the filesystem and compare it against a huge table to create a case-folded version of the path which we'll use for comparisons and sorting. This will add a bunch of complexity to the code, slow down all path lookups, increase our RAM footprint, make it more difficult for users of our filesystem to handle paths, and create a compatibility nightmare for future versions if we ever decide to change the table. But, you see, it'll all be worth it, because... _________________."

Can you fill in the blank?

Wednesday, October 10, 2007

ZFS Hater Redux

MWJ has responded to my last post, Don't Be a ZFS Hater, with a post of their own: You don't have to hate ZFS to know it's wrong for you.

I don't like the point-by-point quote and response format — it's way too much like an old-school Usenet flamewar. So I will simply try to hit the high points of their arguments.

Where we agree

ZFS is not ready to deploy to the entire Mac OS X user base today. There's still some work to be done.
ZFS isn't necessary for most of today's Macintosh computers. If you have been using your Mac with no storage-related problems, then you can keep on using it that way. Perform regular backups and you'll be just fine.
It would be an absolutely terrible idea to take people's perfectly working HFS+ installations on existing computers and forcibly convert them to ZFS, chuckling evilly all the while. Not quite sure where that strawman came from.
ZFS fatzaps are expensive for small files. If it were true that 20% of the files in a Mac OS X installation required a fatzap (pdf link to ZFS-on-disk specification), that would indeed be unnecessarily wasteful.
A typical Mac OS X 10.4.x installation has on the order of about 600,000 files.

I think that's about it. But of course there are a number of places where we disagree too.

ZFS would be awfully nice for a small segment of the Mac OS X user base if it were ready today.

If you spend any amount of time managing storage — if drives have gone bad on you, if you have ever run out of space on a desktop system and needed to add a drive (or two), if you have a RAID array — then you are the sort of user that could see some immediate benefit.

But of course as we already agreed, it's not ready today. You haven't been "cheated" and I'm sure you don't feel that way. But feel free to look forward to it: I sure am.

ZFS — or something with all the features of ZFS — will be more than nice, it will be necessary for tomorrow's Macintosh computers.

Both storage sizes and consumer consumption of storage grow exponentially. I tried to make this point last time, but MWJ seems to have misunderstood and accused me of misquoting. Let's try again.

In 1997, 20GB of storage meant a server RAID array. Ten years later, in 2007, 20GB of storage is considered "not enough" by most people. Across my entire household I have drives larger than that in my computer, in my TiVo, in my PlayStation 3, and even in my iPod. Now let's extrapolate that into the future.

In 2007, 20TB of storage means a server RAID array. Ten years from now, in 2017, 20TB of storage will similarly be considered "not enough". MWJ scoffed at ZFS because it's really pretty good at the problems of large storage. But you know what? A solution to managing that much data will need to be in place in Mac OS X well before 20TB drives become the norm. Better hope someone's working on it today.

Meanwhile — and this is what scares the pants off me — the reliability numbers for hard drives have improved much more slowly than capacity.

Here's a fairly typical Seagate drive with a capacity of ~150GB = ~1.2 x 10¹² bits. The recoverable error rate is listed as 10 bits per 10¹² bits. Let's put those numbers together. That means that if you read the entire surface of the disk, you'll typically get twelve bits back that are wrong and which a retry could have fixed. (Updated Oct 11 2007: In the comments, Anton corrected me: I should've used the unrecoverable error rate here, not the recoverable error rate. The net result is that in ideal operating conditions bit errors occur over 100x less frequently than I originally suggested. However, it's still not zero. The net result is still a looming problem when you scale it across (installed base) x (storage consumption) x (time). See the comment thread.)

Yes, really. Did you catch the implications of that? Silent single-bit errors are happening today. They happen much more often at high-end capacities and utilizations, and we often get lucky because some types of data (video, audio, etc) are resistant to that kind of single-bit error. But today's high end is tomorrow's medium end, and the day after tomorrow's low end. This problem is only going to get worse.

Worse, bit errors are cumulative. If you read and get a bit error, you might wind up writing it back out to disk too. Oops! Now that bit error just went from transient to permanent.

Still think end-to-end data integrity isn't worth it?

Apple using ZFS rather than writing their own is a smart choice.

As I hope I made abundantly clear in the last post, extending HFS+ to the future that we can see looming is just not an option — its structure is simply too far removed from these problems. It's really just not worth it. It's pretty awesome that the original HFS design scaled as far as it did: how many people can come up with a 20-year filesystem? But you have to know when to throw in the towel.

So if you accept that the things I described above are real, looming problems, then Apple really does need a filesystem with at least several of the more important attributes of ZFS.

The choices at this point are essentially twofold: (1) start completely from scratch, or (2) use ZFS. There's really no point in starting over. ZFS has a usable license and has been under development for at least five years by now. By the time you started over and burned five years on catching up it would be too late.

And I really do want to reiterate that the shared community of engineers from Apple, Sun, and FreeBSD working on ZFS is a real and measurable benefit. I've heard as much from friends in CoreOS. I can't understand the hostility to this very clear and obvious fact. It's as if Apple suddenly doubled or tripled the number of filesystem engineers it has available, snagging some really brilliant guys at the top of their profession in the process, and then multiplied its testing force by a factor of 10.

(To respond to a query voiced by MWJ, HFS+ never gathered that community when it was open-sourced because the design was already quite old at that point. It frankly didn't have anything new and exciting to offer, and it was saddled with performance problems and historical compromises of various kinds, so very few people were interested in it.)

ZFS fatzaps are unlikely to be a significant problem.

This gets a bit technical. Please skip this section if you don't care about this level of detail.

MWJ really pounded on this one. That was a bit weird to me, since it seemed to be suggesting that Apple would not expend any engineering effort on solving any obvious glaring problems with ZFS before releasing it. That's not the Apple I know.

But okay, let's suppose that we're stuck with ZFS and Mac OS X both frozen as they stand today. Let's try to make an a priori prediction of the actual cost of ZFS fatzaps on a typical Mac OS X system.

Classic HFS attributes (FinderInfo, ExtendedFinderInfo, etc) are largely unnecessary and unused today because the Finder uses .DS_Store files instead. In the few cases where these attributes are set and used by legacy code, they should fit easily in a small number of microzaps.
Extended attributes may create fatzaps. Today it seems like extended attributes are typically used on large files: disk images, digital photos, etc. This may provoke squawking from the peanut gallery, but once a file is above a certain size — roughly a couple of megabytes — using an extra 128KiB is negligible. If you have a 4MiB file and you add 128KiB to track its attributes, big deal: you've added 3%. It's not nothing, but it's hardly a significant problem.
Another likely source of fatzaps in ZFS on Mac OS X is the resource fork. But with Classic gone, new Macs ship with virtually no resource forks on disk. There are none in the BSD subsystem. There are a handful in /System and /Library, mostly fonts. The biggest culprits are large old applications like Quicken and Microsoft Office. A quick measurement on my heavily-used one-year-old laptop shows that I have exactly 1877 resource forks out of 722210 files — that's 0.2%, not 20%.
(Fun fact: The space that would be consumed by fatzap headers for these resource files comes out to just 235 MiB, or roughly six and a half Keyboard Software Updates. Again: not nothing, but hardly a crisis to scream about.)

Want to measure it yourself? Amit Singh's excellent hfsdebug utility will show you a quick summary. Just run "sudo hfsdebug -s" and look at the numbers for "files" and "non-zero resource forks". Or try "sudo hfsdebug -b attributes -l any | less" to examine the files which have extended attributes on your disk.

ZFS snapshots don't have to be wasteful

The cheesesteak analogy was cute. But rather than imagining that snapshots just eat and eat and eat storage until you choke in a greasy pile of death, it would help if we all actually understand how hard drive storage is actually used in practice, and how ZFS can work with that.

There are three major classes of stored data.

Static data is data that you want to keep and almost never modify. This is your archive. Photographs, music, digital video, applications, email, etc. Archives are additive: unless you really run out of room, you rarely delete the old — you only add new stuff. You want the contents safe and immediately accessible, but they are essentially unchanging.

Snapshotting static data is close enough to free that you won't notice: the only cost is the basic cost of the snapshot. No extraneous data copies are ever created, because you never modify or delete this stuff anyway.

Dynamic data is data that you want to keep, but are modifying with some frequency. This is whatever you are working on at the moment. It might be writing a novel, working in Photoshop, or writing code: in all cases you keep saving new versions over the old.

Snapshotting dynamic data is more expensive, because if you do it too much without recycling your old snapshots then you can build up a large backlog.

Transient data is data that should not be persistent at all. These are your temporary files: local caches, scratch files, compiler object files, downloaded zip files or disk images, etc. These may be created, modified, or deleted at any moment.

Snapshotting transient data is generally a bad idea — by definition you don't care that much about it and you'd prefer it to be deleted immediately.

Got all that? Okay. Now I need to make a couple of points.

First, I assert that virtually all of the data on personal computer hard drives is static most of the time. Think about that. The operating system is static the whole time you are using it, until you install a system update. (And even then, usually just a few hundred megabytes change out of several gigabytes.) Your /Applications folder is static. Your music is static. And so on. Usually a few percent of your data is dynamic, and a few more percent is transient. But in most cases well over 95% is static. (Exceptions are easy to come up with: Sometimes you generate a large amount of transient data while building a disk image in iDVD or importing DV footage. That can shift the ratio below 95%. But once that task is complete you're back to the original ratio.)

Second, the biggest distinction that matters when snapshotting is separating persistent data from transient data. Taking snapshots of transient data is what will waste disk space in a hurry. Taking snapshots of dynamic data as a local backup is often valuable enough that it's okay to burn the small amount of disk space that it takes, because remember: that's the actual data that you're actively working on. And as we already mentioned, snapshots of static data are free.

Now here's where it gets interesting.

With ZFS, snapshots work on the filesystem level. Because it no longer uses the "big floppy" model of storage, new filesystems are very cheap to create. (They are almost as lightweight as directories, and often used to replace them.) So let's create one or more special filesystems just for transient data and exclude them from our regular snapshot process. In fact on Mac OS X that's easy: we have well-defined directories for transient data: ~/Library/Caches, /tmp, and so on. Link those all off to one or more transient filesystems and they will never wind up in a snapshot of the important stuff. I wouldn't expect users to do this for themselves, of course — but it could certainly be set up that way automatically by Apple.

Once the transient data is out of the picture, our snapshots will consist of 95% or more static data — which is not copied in any way — and a tiny percentage of dynamic data. And remember, the dynamic data is not even copied unless and until it changes. The net effect is very similar to doing an incremental backup of exactly and only the files you are working on. This is essentially a perfect local backup: no duplication except where it's actually needed.

Will you want to allow snapshots to live forever? Of course not. One reasonable model for taking backup snapshots might be to remember 12 hourly snapshots, 7 daily snapshots, and 4 weekly snapshots. If you are getting tight on storage the system could take new snapshots less frequently and expire them more aggressively. Remember: when nothing is changing the snapshots don't take up any space.

Wrap-up: Listen to the smart guys

Some very smart people at Sun started the ball rolling by putting an awful lot of thought into the future of storage, and they came up with ZFS.

After they announced it and started talking about it, other brilliant people at Apple (and FreeBSD, and NetBSD) paid attention to what they were doing. And they listened, and thought about it, and looked at the code, and wound up coming around to the side of ZFS as well.

If you think I'm smart, just know that I'm in awe of some of the guys who've been involved with this project.

If you think I'm stupid, why, I look forward to hearing from you in the comments.

Saturday, October 06, 2007

Don't be a ZFS Hater

John Gruber recently linked to — and thus gave credibility to — a MWJ post ripping on a fairly reasonable AppleInsider post about ZFS. Representative quote:

“We don't find HFS Plus administration to be complex, and we can't tell you what those other things mean, but they sound really cool, and therefore we want them. On the magic unlocked iPhone. For free.”

Har har har. Wait. Hold on a minute. Why is it suddenly fashionable to bash on ZFS?

Part of it is a backlash to the weird and obviously fake rumor about it becoming the default in Leopard, I guess. (No thanks to Sun's CEO Jonathan Schwartz here, who as far as I know has never publicly said anything about why he either misspoke or misunderstood what was going on back in June.)

But don't do that. Don't be a ZFS hater.

A word about my background

Let's get the credentials out of the way up front. Today I work on a file I/O subsystem for PlayStation 3 games. Before that, I worked in Apple's CoreOS filesystems group. Before that, I worked on DiscRecording.framework, and singlehandedly created the content subframework that streamed out HFS+, ISO-9660, and Joliet filesystems. Before that, I worked on the same thing for Mac OS 9. And before that, I worked on mass storage drivers for external USB/FireWire drives and internal ATA/ATAPI/SCSI drives.

You might say I know a thing or two about filesystems and storage.

What bugged me about the article

ZFS is a fine candidate to replace HFS+ eventually. It's not going to happen overnight, no. And it'll be available as an option for early adopters way before it becomes the default. But several years from now? Absolutely.
The bizarre rants about ZFS wasting processor time and disk space. I'm sorry, I wasn't aware that we were still using 30MHz machines with 1.44MB floppies. ZFS is great specifically because it takes two things that modern computers tend to have a surplus of — CPU time and hard disk space — and borrows a bit of it in the name of data integrity and ease of use. This tradeoff made very little sense in, say, 1992. But here in 2007 it's brilliant.
Sneeringly implying that HFS+ is sufficient. Sure, HFS+ administration is simple, but it's also inflexible. It locks you into what I call the "big floppy" model of storage. This only gets more and more painful as disks get bigger and bigger. Storage management has come a long way since the original HFS was created, and ZFS administration lets you do things that HFS+ can only dream of.
Claiming that RAID-Z is required for checksums to be useful. This is flat-out wrong. Sure, RAID-Z helps a lot by storing an error-correcting code. But even without RAID-Z, simply recognizing that the data is bad gets you well down the road to recovering from an error — depending on the exact nature of the problem, a simple retry loop can in fact get you the right data the second or third time. And as soon as you know there is a problem you can mark the block as bad and aggressively copy it elsewhere to preserve it. I suppose the author would prefer that the filesystem silently returned bad data?
Completely ignoring Moore's Law. How dumb do you need to be to willfully ignore the fact that the things that are bleeding-edge today will be commonplace tomorrow? Twenty gigabyte disks were massive server arrays ten years ago. Today I use a hard drive ten times bigger than that just to watch TV.

Reading this article made me feel like I was back in 1996 listening to people debate cooperative vs preemptive multitasking. In the Mac community at that time there were, I'm ashamed to say, a lot of heated discussions about how preemptive threading was unnecessary. There were some people (like me) who were clamoring for a preemptive scheduler, while others defended the status quo — claiming, among other things, that Mac OS 8's cooperative threads "weren't that bad" and were "fine if you used them correctly". Um, yeah.

Since then we've thoroughly settled that debate, of course. And if you know anything about technology you might be able to understand why there's a difference between "not that bad" and "a completely new paradigm".

ZFS is cool

Let's do a short rundown of reasons why I, a qualified filesystem and storage engineer, think that ZFS is cool. I'll leave out some of the more technical reasons and just try to keep it in plain English, with links for further reading.

Logical Volume Management. Hard disks are no longer big floppies. They are building blocks that you can just drop in to add storage to your system. Partitioning, formatting, migrating data from old small drive to new big drive -- these all go away.
Adaptive Replacement Caching. ZFS uses a smarter cache eviction algorithm than OSX's UBC, which lets it deal well with data that is streamed and only read once. (Sound familiar? It could obsolete the need for F_NOCACHE.)
Snapshots. Think about how drastically the trash can metaphor changed the way people worked with files. Snapshots are the same concept, extended system-wide. They can eliminate entire classes of problems.
I don't know about you, but in the past year I have done all of the following, sometimes more than once. Snapshots would've made these a non-issue:
- installed a software update and then found out it broke something
- held off on installing a software update because I was afraid something might break
- lost some work in between backups
- accidentally deleted an entire directory with a mistyped rm -rf or SCM delete command.
Copy-on-write in the filesystem makes snapshots super-cheap to implement. No, not "free", just "so cheap you wouldn't possibly notice". If you are grousing about wasted disk space, you don't understand how it works. Mac OS X uses copy-on-write extensively in its virtual memory system because it's both cheap and incredibly effective at reducing wasted memory. The same thing applies to the filesystem.
End-to-end data integrity. Journaling is the only thing HFS+ does to prevent data loss. This is hugely important, and big props are due to Dominic Giampaolo for hacking it in. But journaling only protects the write stage. Once the bits are on the disk, HFS+ simply assumes they're correct.
But as disks get larger and cheaper, we're finding that this isn't sufficient any more. The odds of any one bit being wrong are very small. And yet the 200GB hard disk in my laptop has a capacity of about 1.6 trillion bits. The cumulative probability that EVERY SINGLE ONE of those bits are correct is effectively zero. Zero!
Backups are one answer to this problem, but as your data set gets larger they get more and more expensive and slow. (How do you back up a terabyte's worth of data? How long does it take? Worse still, how do you really know that your backup actually worked instead of just appearing to work?) So far, disk capacity has consistently grown faster than disk speed, meaning that backups will only continue to get slower and slower. Boy, wouldn't it be great if the filesystem — which is the natural bottleneck for everything disk-related — helped you out a little more on this? ZFS does.
Combinations of the above. There are some pretty cool results that fall out of having all of these things together in one place. Even if HFS+ supported snapshots, you'd still be limited by the "big floppy" storage model. It really starts to get interesting when you combine snapshots with smart use of logical volume management. And we've already discussed how RAID-Z enhances ZFS's basic built-in end-to-end data integrity by adding stronger error correction. There are other cool combinations too. It all adds up to a whole which is greater than the sum of its parts.

Is any of this stuff new and unique to ZFS? Not really. Bits and pieces of everything I've mentioned above have showed up in many places.

What ZFS brings to the table is that it's the total package — everything all wrapped up in one place, already integrated, and in fact already shipping and working. If you happened to be looking for a next-generation filesystem, and Apple is, you wouldn't need to look much further than ZFS.

Still not convinced?

Okay, here are three further high-level benefits of ZFS over HFS+:

Designed to support Unix. There are a lot of subtleties to supporting a modern Unix system. HFS+ was not really designed for that purpose. Yes, it's been hacked up to support Unix permissions, node locking, lazy zero-fill, symbolic links, hard links, NFS readdir semantics, and more. Some of these were easy. Others were painful and exhibit subtle bugs or performance problems to this day.
Designed to support modern filesystem concepts. Transactions. Write cache safety. Sparse files. Extended metadata attributes. I/O sorting and priority. Multiple prefetch streams. Compression. Encryption. And that's just off the top of my head.
HFS+ is 10-year-old code built on a 20-year-old design. It's been extended to do some of this, and could in theory be extended to do some of the others... but not all of them. You'll just have to trust me that it's getting to the point where some of this stuff is just not worth the engineering cost of hacking it in. HFS+ is great, but it's getting old and creaky.
Actually used by someone besides Apple. Don't underestimate the value of a shared standard. If both Sun and Apple start using the same open-source filesystem, it creates a lot of momentum behind it. Having more OS clients means that you get a lot more eyes on the code, which improves the code via bugfixes and performance enhancements. This makes the code better, which makes ZFS more attractive to new clients, which means more eyes on the code, which means the code gets better.... it's a virtuous circle.

Is ZFS the perfect filesystem? I doubt it. I'm sure it's got its limitations just like any other filesystem. In particular, wrapping a GUI around its administration options and coming up with good default parameters will be an interesting trick, and I look forward to seeing how Apple does it.

But really, seriously, dude. The kid is cool. Don't be like that.

Don't be a ZFS hater.

Updates:

MWJ's response: You don't have to be a ZFS hater to know it's wrong for you.
My followup: ZFS Hater Redux

Friday, August 31, 2007

What should TV shows cost?

While reading Daring Fireball today, I saw that Apple has posted a press release scolding NBC for wanting to more than double the cost of their content.

Apple® today announced that it will not be selling NBC television shows for the upcoming television season on its online iTunes® Store (www.itunes.com). The move follows NBC's decision to not renew its agreement with iTunes after Apple declined to pay more than double the wholesale price for each NBC TV episode, which would have resulted in the retail price to consumers increasing to $4.99 per episode from the current $1.99. ABC, CBS, FOX and The CW, along with more than 50 cable networks, are signed up to sell TV shows from their upcoming season on iTunes at $1.99 per episode.

'We are disappointed to see NBC leave iTunes because we would not agree to their dramatic price increase,' said Eddy Cue, Apple's vice president of iTunes. 'We hope they will change their minds and offer their TV shows to the tens of millions of iTunes customers.'

First of all, I'm totally amused that Apple is dragging this dirty laundry out in public to shame NBC. It's a smart business move, and I have no objections to it: I just get a kick out of big corporations fighting.

Second, it brings up an excellent question about the value of content. Particularly content that is sometimes expensive to produce. What exactly do consumers perceive as the "right price" for something like a TV show?

What Content Producers Think

I think content producers operate on a couple of different mental models when it comes to pricing their content for digital distribution. Most of these models are terribly flawed, because they're not looking at the consumer's point of view.

Production Cost: "The more expensive it is to produce, the higher the price should be."

Length: "A one-hour TV show should cost around the same as a one-hour CD."

Cross-Media Comparisons: (To the most expensive alternative.) "We can sell a full season of DVDs for $60, which works out to $5 per show."

Threat of Withholding: "You're going to buy it at whatever price we ask, or else you won't get it at all."

These models probably only make sense to you if you're a content producer talking about your own content. As a consumer, they probably make you angry.

What Consumers Think

As a consumer, these are the things I think about when I am considering buying content.

Total Usage Time: "How much total time will I spend enjoying this content?"

Total usage time is very different from the length of the content. Total usage time is equal to the length, multiplied by the number of uses I expect to get from it.

For a CD or album, the total usage time is vastly different from that of a TV show. When I buy music, I expect to listen to it again and again. Most TV shows I only watch once. Sure, if it's a download I could watch it again and again, but odds are that the 45 minutes of content in your hour-long show is only going to get 45 minutes of my time. Or less. Movies and concert videos are somewhere in the middle: particularly good movies will sometimes get a second or third viewing, but still nothing approaching what a good CD gets.

Quality and Extras: "Is this particularly high-quality content? Does it have something extra that I'm interested in, like commentary or subtitles?"

Although this isn't my primary factor, I'm generally willing to pay a little more for content that goes the extra mile. Sometimes that means higher resolution (Blu-Ray vs DVD, or SACD or DVD-Audio vs CD vs MP3), sometimes that means particularly extraordinary special effects, sometimes it's commentary from a director or actor that I admire, sometimes it's a behind-the-scenes "Making Of" featurette.

For example, I almost never buy TV shows on DVD because they're just too expensive. But in the few rare cases where I do, it's because it's a show that I particularly enjoy and would like in a high-quality format with lots of extras.

Cross-Media Comparisons: (To the least expensive alternative.) "Can I get everything I want somewhere else, for less money?"

Sometimes I simply want to catch up on a show's plot developments. Reality shows are a good example: I sure don't watch them for the quality of the writing. If I'm traveling and I miss a showing of American Idol or Who Wants to Be A Superhero?, I'm not going to spend any significant amount of money at all on finding out who got kicked off this week. I'd rather just look it up online.

If it's a show like Battlestar Galactica that I actually do want to watch and appreciate, well, I own a DVR which lets me record the "free" showings from network or cable TV. Chances are unless I'm in a big hurry, I'll just watch the version that was automatically recorded for me once I get home.

This is also where piracy comes in. As Steve Jobs has said, content producers have to compete with piracy as a business model. Video-sharing sites and BitTorrent servers may offer the content at no cost. Piracy is a funny thing. It isn't attractive to most people ... until you start withholding or overcharging for content. Then it spikes. Personally I don't have any qualms about watching all or part of a TV show on YouTube if it isn't available anywhere else.

You can go into great depth parsing the legal arguments around both of those last two, but you'll be wasting your time. Consumers don't. The fact is that in our minds — yours and mine — TV show downloads are essentially competing with "free", twice over.

Convenience vs Importance: "Is the price I'm paying (in both money and inconvenience) proportionate to how much I care?"

Sometimes I'm only mildly interested in your content. I think most web browsing falls under this umbrella, which is why it's so difficult to make money by charging for content on the web. If I'm only mildly interested, I'm not going to pay you anything at all for it, but I might spend a few minutes watching it ... if it's free and easy to do so.

Of course, there are times when I'm extremely interested in your content. I'm generally more willing to pay money for it in that case. But the number of things that are quite that important to me necessarily has to be kept very small — because I don't have an infinite amount of money. (IMPORTANT NOTE: If you would like to give me an infinite amount of money, e-mail me and we'll talk.)

Resolving the difficulties

That's it. Those four criteria are what I use to evaluate prices as a consumer, and I would be willing to bet that I'm not alone. Realistically, I think the consumer models are what will ultimately drive the cost.

So what's the fair price for a TV show? If you go by total usage time, the value of a TV show is substantially less than the value of a single track of music, despite the fact that it costs more to produce. The convenience factor of getting it whenever you want it may bump up the price you're willing to pay, but I think that $2 is actually already too expensive in the minds of many consumers. They'd be better at $1 ... or even free.

What's the fair price of a TV show to you?

Sunday, August 19, 2007

BASIC for the iPhone

This year C4[1] played host to Iron Coder Live. The API was "iPhone" and the theme was "conspiracy". The winner was well-deserved. But I had fun doing a hack of my own.

There was just one problem: I don't have an iPhone.

As I was driving from Cleveland to Chicago, I had a lot of time—about five hours—to think about what I might do for the contest. Without an actual iPhone it had to be something that could be largely developed offline and only run on the phone at the contest. That meant that I would need to stick to the, ahem, "official API" of the iPhone: HTML and JavaScript.

Although the phone is cool, I think it's even cooler that a whole community has built up to create the SDK that Apple failed to provide. After talking with some old friends at the conference I decided that it would be fun to join the conspiracy and create my own SDK for the iPhone, offering the ultimate retro-cool language: BASIC for the iPhone.

The version I showed at the conference garnered a very respectable 5th place, which I'm more than satisfied with considering its far more polished competition. I've since cleaned it up, GPLed it, and rechristened it. And here it is:

ippleSoft BASIC

The interface has been scaled for the iPhone, but it works just fine in Safari and Firefox.

ippleSoft BASIC for the iPhone - Run it now!
View the source
Command reference - also available by entering HELP at the prompt.

Notes

Naming: I tried to pick a name that wouldn't infringe on anyone's existing trademark. That's harder than you might think: "iPhone BASIC" would surely lead to a cease-and-desist letter from the hounds. iBASIC, IP-BASIC, etc were all taken. Then I ran into a stroke of luck: It turns out "AppleSoft" is no longer a registered trademark of Apple. They let it expire in 2001. Thus: ippleSoft.

Entirely JavaScript, entirely free. I've made it available under GPL v2. There's no public repository yet, but hey, it's GPL, so you can go and create one yourself. I've released code into the public domain before, but this is the first project I've ever released under GPL.

I'm not a JavaScript programmer. I learned about the language and tried to follow best practice, though. Well, as much as I could during a single beer-soaked weekend.

Autoscroll: The output display will autoscroll in Mac Safari, but not iPhone Mobile Safari. You can flick the output display to scroll it, though. If anyone with an iPhone has a suggestion on how to make it autoscroll properly please let me know!

Language: For now I've kept it pretty close to a strict subset of AppleSoft BASIC. There's a lot it doesn't do yet: graphics and text positioning are not supported, and neither are INPUT or GET. Those are pretty high on the list of priorities though.

One final thought

You know what? Text-based interfaces suck on the iPhone. There's just no getting around this. It's not the output so much as the input. The iPhone feels great when you're pushing buttons and flicking the display to scroll. But the keyboard is really lame.

I'm not sure if that's a feature or a bug for ippleSoft BASIC. Personally I find it kind of perversely amusing to use the iPhone to run a language originally designed for teletypes. But it would be interesting to explore ways to take the text out of BASIC.

Loading and running programs from elsewhere on the web would be a natural. A keyword-board like the one on the Timex-Sinclair TS1000 might be useful if you really must write code on the iPhone. INPUT and GET may need to be supplemented with some type of dialog interface. Anything else?

What do you think?

Tuesday, August 08, 2006

New Parallels Desktop Beta

A new beta update of Parallels Desktop has been released that updates Parallels Desktop for Mac to Build 1862 Beta.

This update fixes the disk caching policy problem that PD Tweaker was created for. If you were wondering, PD Tweaker is perfectly safe to run with the new update and doesn't conflict with it at all. But it will be unnecessary once you have configured your VM properly.

Curiously, instead of just fixing it they decided to make it an option under the virtual machine settings -> Options -> VM flags:

Choose virtual hard disk cache policy for better performance of:

[X] Virtual machine
[ ] Mac OS X

"Virtual machine" is the default setting. For the technically-minded among you, this is like having a radio button to select between two behaviors: "incorrect" and "correct", and defaulting to "incorrect".

In this case the correct behavior is to select the second option, "Mac OS X". There are rare scenarios where you might want the other behavior, for example if you were setting up a dedicated server box that does nothing except run virtual machines. But in almost all other cases the second option is correct. I'll blog more about this later.

There are several significant other improvements, including video acceleration -- which completely eliminates the display-update lag when you're typing in Windows, hooray! All in all it seems like a very worthwhile update so far. Remember to re-install the Parallels Tools on your client OS after updating so that you get the new drivers.

Friday, June 30, 2006

CLImax 1.5d2 re-released

Any old-school scripters out there might be happy to hear that I've re-released CLImax 1.5d2, which I recently rediscovered on a long-lost backup.

CLImax is an AppleScript command-line interface for the Mac OS. It was originally written in 1996 for System 7.5, and development was stopped in 1998. Ten years after its first release, version 1.5d2 can still run under Classic today. It's not as convenient as it was under 9.2.2 natively, but it's not bad either.

I've received a bunch of positive comments about CLImax from people including Bill Cheeseman (of AppleScript Sourcebook), Ray Barber (of MacScripter), and Peter Hosey (of Adium X). Peter was the one who actually got me to re-release it.

CLImax 1.5d2 is now FREEWARE and available at no charge. Be sure to visit the CLImax homepage for more information, including screenshots.

I'm currently soliciting feedback on whether it'd be worth porting it to Mac OS X and creating a modern version. Thoughts?

Thursday, June 22, 2006

PD Tweaker 1.0

For all you fans of Parallels Desktop out there, I've released a quick little hack to fix some problems in their initial release (aka: build 1848). It's called PD Tweaker.

It's very simple and only does two things.

Optimizes caching for HDD and SAV files

Caching large files is actually harmful to your Mac's overall performance. HDD files do not need caching because the client OS will already have a cache layer. SAV files are streamed in and out and don't need to reside in the disk cache.
Always writes HDD and SAV files all the way to disk

Your data is precious. Especially data like a HDD file that took you hours to install and configure. Shouldn't you treat it that way?

I wrote it primarily because although Parallels Desktop is a great product, I got fed up with the way it wasted my entire 2GB of RAM with the HDD file and made my machine virtually unusable. With PD Tweaker installed, you will not only notice greatly improved performance from your Mac as a whole, but your HDD files will be safer from data corruption as well. And best of all, it's entirely free and the source code is available.

It uses Unsanity's Application Enhancer 2.0 to do its thing.

More information (including some rationale and a technical explanation) is available on the PD Tweaker website. What are you waiting for? Go there now!

Thursday, May 18, 2006

Delaying the xnu-x86 source release

[Updated Aug 7 2006; see below.]

Tom Yager at InfoWorld points out that Apple has released the source code to everything in the x86 build of OSX Darwin except for the kernel, xnu.

He also makes what appears to be a completely and utterly unsubstantiated statement about why:

Thanks to pirates, or rather the fear of them, the Intel edition of Apple’s OS X is now a proprietary operating system.

First of all, what? That's a bold statement. Got any source for that claim? That seems like sheer FUD, deliberately sensationalized to create a stir and bring people to the site (and therefore sell ads). From everything I've seen, and I've worked with these people, Apple's security team and upper management know better than to rely on security through obscurity. And to date I don't know of any official or even unofficial statement from Apple about xnu-x86 -- just the fact that several people have noticed that the source still hasn't been released.

[Update 5/22 via John Gruber: Apple's Product Manager for Open Source reiterates that Apple has not made any announcement yet, and drops what sounds like a big hint that xnu-x86 will eventually be released.]

Let me offer two guesses at the real reason why we haven't seen source to xnu-x86 yet:

The xnu-x86 source might leak information about a future product. The obvious candidate is the only missing link in the x86 line, the pro desktops. You know, the ones that will replace the "PowerMac". Let's call them "Mac Pro" for lack of a better name.
For example, if the highest end Mac Pro desktop machines were planned to have, say, 4 CoreDuos packed into them for a whopping 8 CPU cores, then chances are you would see traces of that support show up in several parts of xnu. And when Apple releases a major rev of xnu, there are always some people who pore over it looking at the diffs.
Apple does have ways to keep prerelease stuff out of the source release, of course, but it adds a layer of complication and risk to go back and hack that up after the fact. Maybe this time they decided it was just simpler to drag their feet for a while until the entire new line has been announced.
The xnu-x86 source might currently contain a small amount of licensed proprietary code that does not belong to Apple. If that's the case, they simply might not be legally allowed to release it in its current form.
Maybe it's virtualization code from Intel, or some sort of Trusted Computing gobbledygook which is currently dormant. If they can't negotiate terms to release it, then they might have to factor the sourcebase somehow to link that other code in separately. Factoring that out seems like it would be totally possible, but kind of a messy task since it's at such a low level in the kernel and they can't sacrifice any performance to do so.

Personally I think #1 fits Apple's modus operandi perfectly. But #2 is also the kind of real-world consideration that could delay a source release for an unknown period of time while the lawyers work things out. I will grant that "fear of pirates" is another technically possible reason why we haven't seen the source, but then again the same could be said for "fear of ninjas".

We'll see what happens. Either way my gut feeling is that this is just a delay in the source release, not a permanent switch to a completely closed-source kernel.

[Update: Aug 7 2006: xnu-x86 has now been released. According to Kevin van Vechten's post at kernel.macosforge.org:

Several changes were made in order to publish the kernel (xnu) sources. As a result, the kernel built from these sources differs from the one found in the 10.4.7 software update. In order to accommodate these changes, several kernel extensions were also modified and must be downloaded and installed in order to run a kernel built from these sources on Mac OS X 10.4.7 for Intel.

Based on that comment, it sounds like the answer is #2 and for the xnu-x86 release they moved some code from the kernel into a kext.]

Thursday, January 12, 2006

Apple's new laptop

It's now been two days since Apple announced the first machines ever to ship with Intel processors, due to ship in February. (See my earlier post: Apple's future with Intel.) Steve said something about a new Intel-based iMac or something, I guess, but nobody cares because he also announced a new pro laptop that uses an Intel chip.

The usual flurries of oohs and ahhs, glowing reviews, playful cynicism, and plugs from unexpected directions have all played out. Now people are starting to look a little more closely at the offerings.

Missing Pieces

Finally, for the first time I have no connection whatsoever to Apple and I can talk about a new release without the fear of getting arbitrarily fired. So I'll dig in from my new perspective as an Apple outsider and take a look.

Rosyna examines what's not in the new laptop in his enigmatically titled Lost in Transition: Overcane of Antflower Milk. Don't worry about the title; he's just like that.

To sum up: no S-video, no FireWire 800, no modem, slightly lower resolution (1440x900, when the previous laptop was 1440x960), no dual-layer DVD burning. And from looking at the battery and the power brick, it seems likely that power consumption is actually higher on this machine than the previous iteration. (Not a big deal to me, since that's the natural progression anyway -- but worth noting.)

I'd add two things to that list:

No PC card slot. It was traded up for an ExpressCard/34 slot, which will ultimately be a good thing... but first, people will have to start making cards that fit it. To call the current offerings sparse would be generous. That'll be fixed in a year, but it may be a concern for a handful of early adopters who need PC cards for one reason or another.
No two-button trackpad. This doesn't matter when you're running Mac OS X, because OSX uses control-click. But tough luck if you were hoping to dual-boot into Windows; you simply can't run Windows usefully with a one-button mouse.

Many people have noticed that Apple is being suspiciously quiet about battery life. No statements on battery life are available, which is very odd for a laptop announcement. My sources suggest that they're simply not done with the final power management code, so they don't want to release actual metrics yet. If I may insert a side comment here, my own experience and tendency toward cynicism suggests that they will ship a half-assed version of power management with the machine, then patch it later with a couple of system updates to get to a version that actually works. Call it a hunch.

Performance

As the saying goes, "There are lies, damn lies, and benchmarks." Apple is claiming a whopping 4x speed boost over the previous PowerBook G4. But take a look at this breakdown of the benchmarks from Apple themselves:

MacBook Pro Benchmarks

Notice how only Modo -- an application that is heavily tuned for Intel chips -- is listed as about 4x faster, and everything else gets in the neighborhood of 2x. From Luxology's site:

The modo rendering engine deeply leverages various Intel Technologies to improve scalability and performance. Our bucket rendering provides near linear scales in performance with multiple processors and Intel® Pentium 4 processors with HT Technology.

Everything else that hasn't been hand-tuned quite as much for Intel chips only gets a boost in the neighborhood of 2x. Funny thing, really... 2x is just about the performance boost you'd get going from a single G4 to a dual G4.

From examining those benchmarks, my professional opinion is that it looks like a single core of the new Intel chip runs a little faster in these tests than a comparably-clocked G4, or roughly equal to a comparably-clocked G5.

The real performance boost comes from the fact that there are two cores. It's like the laptop went from a single-processor G4 to a dual-processor G5. That's a real boost, of course -- the machine really does seem to be twice as fast as the old machine, which is awesome and worthy of praise. But it's not because it's Intel vs PowerPC; most of the boost is due to the upgrade from single-processor to dual-processor.

As for people getting over-excited about Intel's SSE vs PowerPC's Altivec, my personal opinion -- as someone who's written code for both -- is that in the final analysis they're really quite similar. Sure, there are differences. Each is good at different things, and optimizing for one is not the same as optimizing for the other. Altivec has more registers, but then again you pay for those extra registers during context switches. Switching to SSE is more of a lateral move than a step back. It'll take a while for Apple and other vendors to convert everything that was Altivec-optimized to be fully SSE-optimized, but they'll get there.

Update: Ars Technica comes through with a hands-on look at the new iMac, and had a chance to run some benchmarks. Since they use the same CPU, the iMac probably has performance very similar to the new laptop. Check it out for more details.

Dreams of Dual-Booting

One of the reasons I was really interested in the new laptop was the possibility of dual-booting into Windows, and later, when the software became available, running Windows in a VMware-style shell.

My current work (contracting for Sony on PSP/PS3) requires me to use Windows XP on a daily basis. While I've mostly managed to customize my Windows system to the point where it satisfies my needs, I still really miss a lot of the nicer little features of the Mac: iChat, iCal, iPhoto, even AppleScript. And when I'm traveling I can't really work on Mac software without bringing two laptops along -- which is obnoxious both because of the extra weight and the extra hassle at the dog-and-pony show that passes for airport "security" in this country.

A single dual-booting laptop would have been a great solution for me -- I'd sign up to buy it right now if I thought the new laptop would deliver on that front, even without a two-button trackpad.

But it's been reported that Windows XP won't boot the laptop, because it uses EFI rather than a BIOS to boot. Longhorn, aka Windows Vista, should work... but who knows when that's coming out? Right now it's supposed to be the end of 2006, but it appears to be even money on whether Microsoft will actually make that date. The Longhorn schedule and magical ballooning feature list has looked a lot like Copland's so far, which isn't very heartening.

In summary, don't get this laptop for its capability to run two OSes; it will be a while before it can. Perhaps once there's a consumer VMware product for Mac OS X that boots XP I'll take a second look at it.

Update: In an interesting twist, Intel has firmly stated that you can definitely boot Windows XP with an EFI Core Duo system if the vendor provides a BIOS compatibility shim. Interesting! However, Ars Technica reports that they had no joy installing either XP or Vista on the new Intel iMac, so the shim is either not there right now or just not provided with the iMac.

But now that it seems at least technically possible, I bet we'll see a BIOS shim for the new laptop -- either factory installed, or as a download. Keep your eyes peeled; it may not be over yet!

The Good Stuff

So far this has seemed way too negative. That's not really how I feel. Overall I have to say I really do like the laptop as a Mac.

Things like lack of S-video and a modem don't bother me personally. Now that wi-fi has really taken off, it's not the crisis it used to be to be stuck in a hotel room without a modem -- you just have to find the nearest Starbucks. If you are one of the 2% of users who really need to use S-video or a modem regularly, you can get an adapter dongle. No, it's not as nice. But it works fine, and it helps bring the price of the laptop down for the rest of us.

I think the built-in iSight is nice; while not always useful, it's one of those things that's nice to have standard. Video chat and other applications of the camera (think Flickr) will increase dramatically if the iSight is now going to be standard on every new machine. I don't like bringing my iSight with me when I travel, but I'd definitely use a built-in one to video chat with my wife if I had it.

And the speed, ah, the speed. This machine really is fast. If you are looking for a fast new Mac laptop, this is what you've been waiting for.

What's up with the name?

I haven't even talked about the name yet. Apple used to sell the PowerBook. I guess they still do, at least for a while. The brand had good name recognition despite that awkward capital B in the middle that nobody actually bothered to type.

But this new laptop has been rebranded and is no longer a PowerBook. Instead, it's called a "MacBook Pro". Hardly anybody I've talked to likes the name, myself included. But I might be able to guess at the rationale behind it. Bear with me.

"PowerBook" is a brand name. A PowerBook actually has three unique brands associated with it: "PowerBook", "Mac", and the superbrand "Apple". Personally, I think it's a very strong name, and it has the benefit of 15 years of brand-building success behind it. People know what you're talking about when you say PowerBook.

MacBook Pro, on the other hand, is a brand extension. A double brand extension, really. "MacBook Pro" is an extension of "MacBook", which itself is an extension of "Mac". On the face of it, from everything the The 22 Immutable Laws of Branding tell us, this is a much much weaker name. Why on earth did Apple decide to make this change?

The only reason that makes sense to me is that Apple must have decided that they're going to start consolidating the Mac brand. Rather than having separate brands under the Mac umbrella, everything Mac is now going to include Mac in the name. If that's true, then the "iBook" name will going away to be replaced by just plain "MacBook". "PowerMac" will go away, to be replaced by something like "Macintosh Pro".

Why consolidate the Mac brand? Perhaps, and just perhaps -- this is wild speculation -- it's to get ready for a possible future move to OSX running on non-Apple PCs. Let me be clear here: I don't expect such a move of the OS to non-Apple PCs for at least five years or more. But Apple of course has to be thinking about the possibility of competing with Microsoft in the future.

If they were to start letting OSX run on non-Apple PCs, they might want to rebrand "Mac OS X" to "Apple OS X" to make a distinction between the high-end Mac computer brand and other non-Mac computers. But doing that would weaken the Mac brand, right? So that may be why we're seeing the brand consolidation now, years before any of these other changes take place. It's about strengthening the brand in anticipation of possible weakening later.

Just a thought.

Wednesday, November 30, 2005

I'm so totally Extreme

This weekend I finally upgraded my wireless network at home. For the past three or four years I've been using a second generation AirPort base station, which was 11Mbps with 40-bit WEP. It's been thoroughly out-of-date for a while now, but it met my needs, was rock-solid, and worked like a charm so I didn't feel the need to upgrade. And after all, 5.5Mbps is still more bandwidth than my real-world connection to the internet, so I knew that switching to the higher-bandwidth AirPort Extreme wasn't going to make a difference in simple web browsing.

But this week I finally broke down and made the jump to the latest and greatest. A combination of two things finally pushed me over the edge:

I've been doing a lot more work lately that involves transferring things between machines in my office. Local transfers from machine-to-machine are where 54Mbps can really make a difference.
The signal strength wasn't so great everywhere in the house, and I wanted to use WDS with an AirPort Express to extend the range of my network.

Another small factor was that if I upgraded, I could have iTunes send its output directly to my stereo via AirTunes. Personally I think AirTunes is kind of a gratuitous and somewhat goofy feature. It's certainly not enough to make me spend a couple of hundred dollars on new equipment. But I have to admit that once the other (much better) reasons pushed me over the edge, I was looking forward to giving it a shot.

So I made the plunge and bought an AirPort Extreme Base Station and an AirPort Express, and hooked everything up.

Perhaps because I like pain, or maybe just because I wanted to see if it would work, I decided to set up the AirPort network from a Windows laptop. Okay, the real reason is that my old TiBook recently died from catastrophic hinge failure and I couldn't find where I'd left the charger for the iBook, which meant that I didn't have a wireless-capable Mac handy.

Setting up the main base station worked like a charm. I plugged it in, ran the Windows version of the admin utility -- which I have to say is really very nice -- and saved the configuration from the old base station into a file on my desktop, then imported it into the new base station. Happily, it copied out my DSL PPPoE account and password. In the process I upgraded the wireless security from WEP to WPA, set my server box as the default host (DMZ), and configured the base station so that it would syslog to one of my machines. Restarted the base station, connected with the new password, and everything worked flawlessly. Great!

Getting the Express set up to extend my network, however, was a little bit trickier. I started at the obvious place with the AirPort Express Assistant for Windows. But somewhere in the middle of the setup as it jiggled Windows XP and base station settings, it failed while reading from the base station with an error -4: "bad param". Tried again several times with the same result. Hmmm. Not so great, and virtually impossible to diagnose what went wrong.

Why, no problem, says I, I'll just configure it directly with the admin utility.

That's much easier said than done. I did get it done in the end, but it certainly didn't go as smoothly as I was hoping it would. There are a lot of non-obvious details that need to be just right before everything works. Most of the answers can be found in the admin utility's help if you know where and how to look, but it takes some digging.

Here are some tips from my experience with setting up WDS manually:

Terminology. You have the main, relay, and remote stations.
- The main base station is the one connected to the internet via Ethernet.
- Relay stations connect between base stations, and do not have an Ethernet connection.
- Remote stations provide services to clients, and do not have an Ethernet connection.
As far as I can tell there's not a lot of practical difference between relay and remote, since both main and relays can be configured to accept client connections too. It may be the case that remote stations are able to dedicate more bandwidth to clients than main or relay stations, but I'm not sure about that. In any event it seems more like something you'd only be concerned about for a large-scale installation with lots of client computers and extremely heavy traffic -- I doubt it matters for home networks.
MAC addresses. Before you start, you need to write down the AirPort MAC addresses of all the base stations involved. It's on the outside of the base station, or you can get it from the base station chooser. If you do that, though, remember that the stations will be broadcasting different wireless networks at first. So you need to join each one in turn, then select it in the chooser (no, not that chooser) and write the AirPort MAC address down somewhere.
"Distribute IP addresses" should only be set on the main base station, not on relays and remotes. Thankfully, the admin utility warns you about this.
All stations must use the same channel. Pick a channel (I like 3 and 10) and set both base stations to it. The admin utility tells you that you can't use 'Automatic', but neglects to mention that all base stations have to use the same channel -- which is kind of an important detail.
Set up the main base station first, then relays, then remotes. When you set up a WDS main base station, you'll need to enter the MAC addresses for the WDS remote and relay stations that will be allowed to connect. It doesn't work if you go the other way, because a remote base station won't be able to connect to the main base station until the main base station has been configured.
Use different SSIDs (network names) at first. It doesn't matter whether base stations connected by WDS have the same network name or different names. Toms Networking recommends that you use different SSIDs, while Apple's Designing AirPort Extreme Networks for Windows recommends on page 38 that you use the same SSID. But if you plan on giving them the same name eventually, don't start out that way! Give the networks different names so that you can be sure you're connecting to the right base station when testing to make sure that it works. Once everything is working you may then decide to set them to the same SSID; it's up to you.
Know where the reset button is. If you make a mistake and can't find one of the base stations on the wireless network anymore, hold down the reset button for about seven seconds (until it starts flashing quickly) to give it a hard reset. The button is recessed, but can be pushed with a paperclip, staple, ball-point pen, or stereo miniplug.
Double-check all of your settings if you are importing settings from an older base station to a newer base station. At one point in the process I noticed that my main base station's transmission strength was apparently set to the lowest setting -- 10%, instead of 100%. I can only speculate about why that happened. The older base station didn't have adjustable signal strength, so perhaps it pulled in a zero value rather than the default of 100% when it imported my old configuration to the new base station.

Phew. Anyway, after a little bit of futzing around, I've got it working and I'm happy. The speed on the local network is much better, and the WDS extension has made the signal strength much stronger throughout the house. It wasn't too much of an ordeal and I got it all sorted out in an hour or two, but it was certainly harder than it should have been. Apparently I'm not the only one who thinks so, either.

But as a bonus to reward all my hard work, I can now have iTunes play through my stereo via AirTunes. Dude, I'm like so extreme.

Thursday, July 28, 2005

Apple's future with Intel

Or, how I learned to stop worrying and love x86

It's been a while since I've posted. Work and family has kept me very busy lately. But since my last post the Mac world has received big news which would be impossible to let go without comment: Apple is switching the Macintosh computer to Intel processors. The keynote video stream where Steve made the announcement is still worth watching if you haven't seen it yet.

Enough time has passed that I'm going to fast-forward through the obvious comments:

Why switch? PowerPC clock speeds were stagnant, and temperatures and power consumption were staying too high. Fixing those problems was possible, but would have required a lot of expensive investment from Apple. And the problems wouldn't stay fixed: Apple would have to keep paying to push the development of the PowerPC, as they've been doing for years now. It finally reached the point where the price just wasn't worth it. Every other reason suggested is secondary.

Did I see it coming? Like a lot of people, I knew how much of the OS ran on x86 to begin with, and was well aware that it was an open possibility. Personally I figured it was inevitable that Mac OS X would run on x86 eventually. It's just happening a little sooner and in a more dramatic fashion than I had thought. I didn't anticipate the abandonment of PowerPC.

The leaks beforehand had all the hallmarks of being authentic and deliberate. As soon as I saw them coming out I felt that they were probably true. The leaks went to real news sources first, not the rumor sites. I believe the first mention came in the Wall Street Journal, and the WSJ's factual reporting is second to none. They would not have printed it if it hadn't been confirmed in some way. (Their opinion page is another matter.) And the Journal's Walt Mossberg seems to be a friend of Steve Jobs; he's been granted special early access to all sorts of Apple technologies and usually gives them glowing reviews. The WSJ is a friendly voice to Apple and they would be an excellent place to leak and start creating a buzz.

Now that that's all out of the way, let's move on. By the way, although I have friends at Apple and I'm probably covered by a ton of NDAs over the years, I want to emphasize that I'm not disclosing any confidential information here. This is all personal speculation based on publicly-available information.

Why Intel?

Why did Apple switch to Intel specifically? And what about the details? A lot of folks have been freaking out and wondering why Apple didn't say anything specifically about x86-64 or AMD.

First up, I'm pretty sure the cost savings to Apple will be more than just the CPU. The talk I've heard says that Intel offers bulk discounts to vendors who switch to Intel across the board, and Apple seems like a likely candidate for such a switch. Intel makes a a lot more than just CPUs, after all -- audio chips, SCCs, ethernet controllers, I/O processors, SATA controllers, and PCI chipsets, just to name a few. Apple already uses a lot of non-CPU Intel chips in Macs, Airport base stations, and iPods. It's certainly feasible that Apple might be going all-Intel-all-the-time.

Nor is it just chips, either. Remember, Intel designs motherboards too. Right now Apple does its board design in-house, which creates a significant lead time for product development as new boards are tested and reworked. Discounted Intel boards -- perhaps fully tested and qualified before they even reach Infinite Loop -- could be yet another potential cost savings. Apple has a penchant for pushing design boundaries with iMacs and laptops, true, but the company still does a brisk business selling desktop machines where space is not at a premium. Besides, Intel has been stuffing Pentiums into tiny spaces lately -- witness their recent clone of the Mac mini.

Could AMD compete with all that Intel has to offer? Frankly, it seems unlikely.

The door will still be open for AMD and other x86 chip vendors. I don't think Apple is going to start using Intel-specific features that AMD can't compete with -- that would be a foolish return to the single-chip-source problems that plagued Apple with the PowerPC. It's likely that the companies have made a deal where Apple commits to Intel for some number of years on negotiated terms, and thereafter the door is open to renegotiate or seek a better deal. So AMD might still get their foot in the door eventually. If Intel starts causing problems you bet Apple will switch to AMD or someone else; but as long as they offer a good deal they will be difficult to beat.

64-bitness

What about 64-bit support? Some folks were upset because Apple had done all this work toward 64-bit PowerPC and seems to now be ditching it. That work absolutely is not wasted. Anyone who thinks Apple is suddenly ditching 64-bit computing should put down the crack pipe -- it's not gonna happen. Sure, after Intel came up with the all-but-failed and incompatible IA-64/Itanium architecture, AMD turned around and created a much more popular backward-compatible architecture called x86-64. Point to AMD. But Intel at least recognized its mistake and cloned x86-64 for use in recent chips.

Apple may not make the jump to 64-bit Intel chips in its very first release, though I think it's at least possible that they might. But if they start with 32-bit chips, I will guarantee that they'll be on 64-bit chips within the year. And further, that they'll be going with the x86-64 architecture rather than the Itanium. It's the only decision that makes sense.

Should you care?

My personal opinion is that as a consumer you probably won't need to care about the transition.

Think about it: why do you buy a Macintosh? Do you really care about the chip that's inside it? No. What you want from a Mac is that you want the nice user experience, you want your apps to work, and you don't want the machine to stink up your desktop and crash and be virus-ridden like Windows.

Apple's switch to x86 will not change anything about the operating system or the applications: it will look and work exactly the same as your current Mac. It will still be just as stable and easy-to-use, all of your software will still run, all of your data will copy over and work just fine, and no viruses are going to magically jump over from Windows onto your Mac just because the hardware is the same.

Should you delay hardware purchases while you wait for the new machines? Well, you could. Apple seems to be expecting a certain number of consumers to make that choice -- they have more cash on hand than ever before ($7.5 billion USD), which it's pretty clear is a buffer against an anticipated temporary decline in Mac sales.

But the new machines won't be out for a couple of years. The first new ones might not be all that great, either; you may want to wait a few months for the second generation. Personally I'm planning on following my normal upgrade schedule: I'll keep my dual 2GHz G5 til it's on its last legs or until I get new free hardware, whichever comes first. The typical Mac lifetime is around four years (as opposed to two for PCs), and that's lengthened a bit by the way PowerPC clock speeds have been lagging, so I probably won't need to upgrade until the new machines are out anyway. As for my laptop, a late-model TiBook, it's getting pretty elderly. I wouldn't mind getting a new PowerPC laptop right now if I could afford it -- but so far furniture and family stuff has delayed me and I don't really need a new one badly enough yet.

So for now I think it's not a big deal: buy a new Mac if you want one and don't worry about it. As we get closer to the release of the first Intel Macs, though, the dropoff in Mac sales will be steeper. People will naturally wait for the new machines.

It'll be interesting to see what Jobs does with the release of the new machines: Will the release date be known or widely anticipated, for example Macworld SF 2007? Or will they get sprung on everyone unexpectedly? Several years of watching how Steve Jobs does things leads me to suspect the latter -- they will probably release the machines at least four months earlier than anyone expects. But it's an open question.

Compatibility and speed

Apple is including a way to run PowerPC apps which it is calling Rosetta. As reported by C|Net and Wired, this is Transitive's multipurpose DR emulator.

In an ironic twist, I've heard that Microsoft is also using Transitive's emulator ... but to translate x86 to PowerPC so that old games can run on the next-gen Xbox. Man. What's next -- cats and dogs living together?

I've tried out the emulator on a development system. Rosetta delivers great performance for simple apps. For a geek like me it's really cool: PowerPC apps just work transparently, and they run fast too!

However, anything that is heavily Altivec-enhanced will probably take a big hit when running under the emulator compared to running on a G5. The emulator does not emulate Altivec, so the app will first fall back to unvectorized floating-point, which could drop its speed to 25% of the vectorized floating-point. Then it would incur the cost of emulation on top of that. So some specialized operations running through the emulator might in theory be as much as 5x slower than on a comparable G5 system. As soon as the app goes native, however, it will gain all of its performance back and more.

But that only affects heavily vectorized applications. Your word processor will continue to work fine under emulation and will in fact probably be faster than before: the extra CPU speed will give it a boost. Games are mostly pumping video textures out using OpenGL, so they will probably not be affected.

Probably the most visible area where you'll see applications slow down is anything that uses QuickTime to compress or decompress video. That includes video playback; frame rates may drop significantly under some codecs. Why? Remember, Rosetta is not a mixed-mode architecture; it runs an entire app from top to bottom. And QuickTime loads and runs inside your application. So even though a native version of QuickTime will be available to native apps, an emulated app will run emulated QuickTime which will be much slower. If I've got all that right, then you can bet Apple is making an extra push to evangelize QuickTime developers to port as quickly as possible.

G4 emulation: Someone asked me the other day whether Apple might upgrade the emulator, which currently emulates a PowerPC G3, to emulate a G4 with Altivec. Frankly I doubt that will happen. It would take a big investment of time and money. And the benefit Apple would get from doing that is only in the short term, during the transition period where you want all the extra speed possible out of the emulator. After a while CPU speeds will increase so much that it just won't matter. The ROI (return on investment) just isn't there; too much work and too little benefit.

Classic

Apple has not announced plans for making the Classic environment run on the Intel machines. As a practical matter the current release of x86 Tiger does not support Classic. But have you noticed that they have not vocally announced that Classic is dead, either?

My guess is that Apple may be working on porting Classic, unannounced. There's a not-insignificant minority of Mac software that still needs it. And Apple has a long history of unbroken compatibility which it would be a shame to end.

In a lot of ways Classic is just a normal application: a lot of hardware-specific details were abstracted out of OS9 during its final years. The emulator doesn't need to be ported, after all: there's no reason why you couldn't run Classic's 68K-to-PPC emulator inside the new PPC-to-x86 emulator.

Of course, in other ways Classic is still very 'special'; there are many hacks all over xnu for it that give it special access to supervisor-level PowerPC stuff which Rosetta isn't going to emulate. Still, it seems like it'd be well worth Apple's time to sic a couple of guys on it for a year or two -- the potential gain is huge.

I suspect there are probably some doubts about whether it's technically possible. That's why there hasn't been a big announcement one way or another. But I'll go out on a limb and say there's a good chance someone will figure out a way to make it work.

Windows emulation

Here's an interesting one. Running x86 will make it a lot easier to run Windows applications on your Mac. Things like Virtual PC will run just about at full speed. Of course, Microsoft might not be very interested in porting Virtual PC to the new architecture, since Mac OS X just got a lot closer to becoming a competitor to Windows. if past history is any indication they'll probably do it eventually, but I bet they will drag their heels.

Other options exist, of course, such as VMware and WINE. It seems very likely that VMware is working on porting their software to x86 Tiger right now, and DarWINE is already underway.

Personally I think this would be an interesting thing for Apple to investigate. They might choose to be hands-off and just bundle VMware, Virtual PC, or DarWINE, but all of these are a little clunkier than what Apple prefers. It's possible that Apple might choose to enter the market themselves and deliver integrated Win32 emulation in some way.

Dual-booting into Windows: This sort of falls under this category. As others have noted, in addition to virtual PCs, it will be possible to dual-boot or triple-boot your Mac. Darwin can easily support the partition map styles used on Windows, so there's no reason you couldn't have a Linux partition, a Windows partition, and a Mac OS X partition. Mac OS X's built-in BSD and X11 are so good, however, that there's probably not a lot of reason for anyone who's not a Linux developer to dual-boot into Linux.

HP, Dell, and more

What about all the talk about whether Apple could or should release Mac OS X in a general release to other PC makers, like HP and Dell? Some people think that would've been a better idea. This attitude is exemplified by the somewhat goofy article Apple's Colossal Disappointment which was posted to Slashdot recently.

I call the article goofy because the author pays no attention whatsoever to business realities, nor does he seem to grasp the concept that policies may change over time. His basic complaint is that Apple is limiting its OS to run on its own hardware for the time being, and he thinks that's a mistake. (Or a "colossal disappointment", to use his exaggerated phrase.)

Here's my take. There were three possible ways Apple could have used its x86 code.

Switch the Mac hardware to x86
Allow a select few PC vendors to ship hardware that runs Mac OS X.
Release Mac OS X widely for anyone and everyone with a PC, the way Microsoft releases Windows.

Apple started with item one for the reasons mentioned above -- the PowerPC has been weighed in the balance and found wanting. But the second two courses of action are not immediately feasible, nor are they wise business decisions.

Sure, they are potentially desirable goals that are at least on Apple's radar. I guarantee you Steve Jobs is thinking about them; when interviewed by Fortune magazine he mentioned that three of the biggest PC makers have repeatedly asked him to license Mac OS X for them to bundle. (My guess: the big three are HP/Compaq, Dell, and Sony.)

But it makes absolutely no sense to do all of those things at once.

What do you think would happen if Apple switched to x86 and immediately let others release x86 machines with Mac OS X too? It would undercut Mac hardware sales in a big way. Apple has been burned by cloning before in the 1990's. The circumstances are different enough now that they may try it again, but they will definitely start out slowly and carefully. It will be several years before you see anyone but Apple shipping Mac OS X.

How about making a widespread general release? First of all, you have the same problem as above where Mac hardware sales are undercut. But okay, perhaps you might make that up with increased software sales. No big whoop. But the real cost comes in the support infrastructure. Mac hardware is fairly dependable and predictable, and Apple does a good job with the small amount of tech support that is needed. But making a widespread release would suddenly skyrocket the number of support calls being made -- people would buy it for their JalopyTech 3000 computer and something would break. Apple gets a call. Frankly, Apple doesn't have the support infrastructure to handle a twenty-fold increase in phone calls. It might be possible to get there in the long run, but it won't happen overnight.

The correct course of action from a business perspective is to do exactly what Apple is doing -- switch their own machines only at first. The other things will still remain possible future directions, and I think that both are likely in the long run. But Apple is an "old" company by tech industry standards, and doesn't plan on biting off more than it can chew at any one time.

The Lockdown

Will Apple use LaGrande as part of a scheme to make sure Mac OS X is running on Mac hardware? Maybe. But it seems very likely to me that Apple won't try TOO hard. They want to discourage the casual user from running an unsupported configuration, and they will. But it's not worth the effort (and would frankly be counterproductive) to lock out the hardcore geeks.

A small amount of 'geek piracy' will cost Apple practically nothing in hardware sales. Contrary to their huge presence on the internet, the actual real-world market share of geeks is fairly small. The vast majority of Apple's and Microsoft's markets are people who wouldn't know how to perform that sort of hack, and wouldn't want to, because they know they wouldn't get support for it. Overall it's far simpler to just buy a Mac in the first place.

Honestly, it's like free advertising... let the geeks hack OSX to run on their machines. Once most geeks start using OSX their prejudices evaporate and they quickly get hooked on it, talk to their friends about it, recommend it, and so on. And that will ultimately grow the Mac's marketshare.