Wednesday, March 10, 2010

4K alignment for disks: IMPORTANT!!!

This is the type of performance you can get when fdisk from util-linux-ng 2.17.1 or later is PROPERLY used to align a partition along 4K boundaries in an SSD (Micron/Crucial C300 in this case):

tar -xzf ./linux-kernel-tarball.tar.gz
real 0m6.682s
user 0m5.783s
sys 0m1.680s

And this is what happens on a different partition of the exact same SSD that I forgot to manually align to 4K boundaries:

tar -xzf ./linux-kernel-tarball.tar.gz
real 0m13.317s
user 0m5.806s
sys 0m1.673s

Sure it only sounds like 6.7 seconds, but look more carefully: If you factor out CPU time, the aligned decompress is happening WELL OVER 10 TIMES FASTER on the aligned partition!

Here's the problem: While the newest version of fdisk will align the FIRST partition for you with the -c option, all SUBSEQUENT partitions have to be HAND ALIGNED to the correct boundaries. The first test you saw was from the /sdb1 first partition... the second from a logical partition that was NOT properly aligned by me.

Lesson #1: EVERY partition has to be aligned (I'm still working on how this affects the extended partition + logical sub-partitions, but I'm guessing both have to be aligned right).

So: Even though I knew about the 4K alignment problem, I was only 1/2 clever, not clever enough!

Lesson #2: Linux NEEDS BETTER TOOLS THAT WORK WITH 4K BY DEFAULT AT ALL TIMES (INCLUDING PESKY EXTENDED PARTITIONS)! Oh, and believing what the drive tells you is out, because drives will lie about 512 byte clusters even when they are designed for 4K underneath! MAKE A MODE THAT IGNORES THE DRIVE AND ALIGNS EACH AND EVERY PARTITION ON 4K BOUNDARIES, NO FUSS, NO MUSS, WE DON'T CARE ABOUT DOS COMPATIBILITY.


I managed to move the two problematic partitions to a conventional drive, re-partition and re-format the misaligned partitions, and then move everything back: ALL the partitions are now operating at the same (incredibly fast) speed that the C300 is capable of providing.

The technique to actually get to the 4K alignment on every partition is tricky, but here's the end result:

>fdisk -ucl /dev/sdb
Disk /dev/sdb: 128.0 GB, 128035676160 bytes
255 heads, 63 sectors/track, 15566 cylinders, total 250069680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd510b84e

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 19538582 9768267+ 83 Linux
/dev/sdb2 19539968 250069679 115264856 5 Extended
/dev/sdb5 19542016 204163072 92310528+ 83 Linux
/dev/sdb6 204167168 243240236 19536534+ 83 Linux
/dev/sdb7 243243008 250069679 3413336 82 Linux swap / Solaris

/dev/sdb1 is the primary partition, and it was the one partition that fdisk aligned properly originally. With the new version of fdisk, using the -c option is absolutely critical to having fdisk at least align the partition properly. The other important option is the -u which displays units in sectors.. not particularly human readable, but useful.

A little more discussion of the above partition table follows. First, note that each and every number under the "Start" column is evenly divisible by the magic 2048 number needed to generate the appropriate alignment for a 4K sector drive. The 2048 multiple appears to align every partition start sector to be divisible by 1024KB (4K aligned too by nature of 1024KB being 256 * 4KB). This may be larger than what is actually needed, but fdisk defaults to this value and the effect on wasted disk space is minimal. The vital thing is that every "Start" sector be a number that is divisible by 2048. This includes the extended paritition /dev/sdb2 which acts as a container for logical partitions sdb5 - sdb7... no exceptions. The "End" sector does not have to be divisible by 2048, although this might be a good idea in order to minimize the number of sectors that are placed between partitions (the amount of wasted space is very small on a modern drive). Incidentally, the "Blocks" column numbers listed above are just 1/2 the value of the number of sectors in each partition (Start - End == Blocks * 2).

In the new "-c" mode, fdisk does attempt to provide 2048 sectors between partitions, but there is NO guarantee that it will start SUBSEQUENT partitions at a sector number that is evenly divisible by 2048... I had to enter all the numbers you see above by hand after doing the math myself. I'm not saying that this is rocket science, but it is WAY beyond what I would expect even an experienced user to have to go through in order to actually enjoy the benefits of technologies that are actually supported by the Linux kernel.... sorry guys, but until this becomes automatic, Windows 7 and even Vista are clearly better than Linux in this area.

I'm a Linux veteran but I was used to doing partitioning using the (slightly) easier to use cfdisk instead of the old-school fdisk utility. Unfortunately, cfdisk has not been updated with this new option. I've heard that parted & gParted are semi-smart about this, but my tests showed they have issues as well. The problem with parted is that it is too smart for its own good... it can query a device to determine if the device has 512 byte or 4096 byte sectors in hardware. Sounds great until you actually use a 4K device in early 2010 where the devices will actively lie to you in order to allow Windows XP to use the drive (albeit with big performance drops). The parted guys need to have the automatic option I request above: IGNORE what the drive says and align EACH AND EVERY partition to 4K to solve the problem. I think that this would even be safe to do on older 512 byte drives since 4K is just a measure of 8 of the old 512 byte sectors. I know that DOS might have issues, but unless a DeLorean shows up and kidnaps me back to 1985, I'm not too worried about that.


  1. I'm confused. The legacy sector size is 512 bytes, the new sector size is 4096 bytes.

    Doesn't that mean your partitions starting sectors should be a multiple of 4096/512, or 8? Representing 8*512 byte sectors?

    (Luckily newer versions of fdisk allow you to specify the sector size with -b.)

  2. @Rich: True, 8 sectors = 4KB which is enough for alignment. But most people go beyond that and align in 2048 sectors = 1MB. Since 1MB is a multiple of 4KB, both provide proper alignment. But 1MB is more "human readable", and its nice to make all partitions be a multiple of 1MB.

  3. Hi, Rich -

    Thanks for this useful blog entry.

    You mention the "new" fdisk. I'm a Linux novice, so I'm not sure what you mean. fdisk 2.2 is the version on my system; is it a recent one? What's different about the newer version?

  4. The "newer" version has the -b option. See the man page.

  5. Yikes, think I'll sit on the fence for SSDs & Linux support for now.