Wednesday, March 10, 2010

4K alignment for disks: IMPORTANT!!!

This is the type of performance you can get when fdisk from util-linux-ng 2.17.1 or later is PROPERLY used to align a partition along 4K boundaries in an SSD (Micron/Crucial C300 in this case):


tar -xzf ./linux-kernel-tarball.tar.gz
real 0m6.682s
user 0m5.783s
sys 0m1.680s


And this is what happens on a different partition of the exact same SSD that I forgot to manually align to 4K boundaries:


tar -xzf ./linux-kernel-tarball.tar.gz
real 0m13.317s
user 0m5.806s
sys 0m1.673s


Sure it only sounds like 6.7 seconds, but look more carefully: If you factor out CPU time, the aligned decompress is happening WELL OVER 10 TIMES FASTER on the aligned partition!

Here's the problem: While the newest version of fdisk will align the FIRST partition for you with the -c option, all SUBSEQUENT partitions have to be HAND ALIGNED to the correct boundaries. The first test you saw was from the /sdb1 first partition... the second from a logical partition that was NOT properly aligned by me.

Lesson #1: EVERY partition has to be aligned (I'm still working on how this affects the extended partition + logical sub-partitions, but I'm guessing both have to be aligned right).

So: Even though I knew about the 4K alignment problem, I was only 1/2 clever, not clever enough!

Lesson #2: Linux NEEDS BETTER TOOLS THAT WORK WITH 4K BY DEFAULT AT ALL TIMES (INCLUDING PESKY EXTENDED PARTITIONS)! Oh, and believing what the drive tells you is out, because drives will lie about 512 byte clusters even when they are designed for 4K underneath! MAKE A MODE THAT IGNORES THE DRIVE AND ALIGNS EACH AND EVERY PARTITION ON 4K BOUNDARIES, NO FUSS, NO MUSS, WE DON'T CARE ABOUT DOS COMPATIBILITY.




UPDATE



I managed to move the two problematic partitions to a conventional drive, re-partition and re-format the misaligned partitions, and then move everything back: ALL the partitions are now operating at the same (incredibly fast) speed that the C300 is capable of providing.

The technique to actually get to the 4K alignment on every partition is tricky, but here's the end result:



>fdisk -ucl /dev/sdb
Disk /dev/sdb: 128.0 GB, 128035676160 bytes
255 heads, 63 sectors/track, 15566 cylinders, total 250069680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd510b84e

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 19538582 9768267+ 83 Linux
/dev/sdb2 19539968 250069679 115264856 5 Extended
/dev/sdb5 19542016 204163072 92310528+ 83 Linux
/dev/sdb6 204167168 243240236 19536534+ 83 Linux
/dev/sdb7 243243008 250069679 3413336 82 Linux swap / Solaris


/dev/sdb1 is the primary partition, and it was the one partition that fdisk aligned properly originally. With the new version of fdisk, using the -c option is absolutely critical to having fdisk at least align the partition properly. The other important option is the -u which displays units in sectors.. not particularly human readable, but useful.

A little more discussion of the above partition table follows. First, note that each and every number under the "Start" column is evenly divisible by the magic 2048 number needed to generate the appropriate alignment for a 4K sector drive. The 2048 multiple appears to align every partition start sector to be divisible by 1024KB (4K aligned too by nature of 1024KB being 256 * 4KB). This may be larger than what is actually needed, but fdisk defaults to this value and the effect on wasted disk space is minimal. The vital thing is that every "Start" sector be a number that is divisible by 2048. This includes the extended paritition /dev/sdb2 which acts as a container for logical partitions sdb5 - sdb7... no exceptions. The "End" sector does not have to be divisible by 2048, although this might be a good idea in order to minimize the number of sectors that are placed between partitions (the amount of wasted space is very small on a modern drive). Incidentally, the "Blocks" column numbers listed above are just 1/2 the value of the number of sectors in each partition (Start - End == Blocks * 2).

In the new "-c" mode, fdisk does attempt to provide 2048 sectors between partitions, but there is NO guarantee that it will start SUBSEQUENT partitions at a sector number that is evenly divisible by 2048... I had to enter all the numbers you see above by hand after doing the math myself. I'm not saying that this is rocket science, but it is WAY beyond what I would expect even an experienced user to have to go through in order to actually enjoy the benefits of technologies that are actually supported by the Linux kernel.... sorry guys, but until this becomes automatic, Windows 7 and even Vista are clearly better than Linux in this area.

I'm a Linux veteran but I was used to doing partitioning using the (slightly) easier to use cfdisk instead of the old-school fdisk utility. Unfortunately, cfdisk has not been updated with this new option. I've heard that parted & gParted are semi-smart about this, but my tests showed they have issues as well. The problem with parted is that it is too smart for its own good... it can query a device to determine if the device has 512 byte or 4096 byte sectors in hardware. Sounds great until you actually use a 4K device in early 2010 where the devices will actively lie to you in order to allow Windows XP to use the drive (albeit with big performance drops). The parted guys need to have the automatic option I request above: IGNORE what the drive says and align EACH AND EVERY partition to 4K to solve the problem. I think that this would even be safe to do on older 512 byte drives since 4K is just a measure of 8 of the old 512 byte sectors. I know that DOS might have issues, but unless a DeLorean shows up and kidnaps me back to 1985, I'm not too worried about that.

Monday, February 15, 2010

Howto (mostly) use a WINS server from a Linux client

The Back Story


Just setup a VPN with the office using OpenVPN. So far I'm really happy with OpenVPN, but OpenVPN (or any VPN for that matter) only serves to bring a remote machine into a LAN... the rest of the configuration builds on top of the VPN.



In my case I'm joining a small office network that offers the standard NT services including a PDC for NT domain authentication, WINS, and file sharing. We also have network printers, but as I've recently found out, they are not going through any centralized print server, which may be why we have problems with several client machines inside the LAN being able to print. Oh.. did I mention that the PDC and Window file server isn't running Windows at all, but is actually Samba?


Yup... Trying to get a Linux client to talk to a Linux server using Windows protocols. We truly live in a bizarre world, but I'm not the only one in this situation. This blog post will be the first in a series of HowTo reports on getting stuff working in a sane manner. For reference, as of this righting I'm using SAMBA 3.4.5, and the samba server is running an older 3.0.x series install.



Name Resolution using NORMAL Linux tools


This won't be a revelation for the SAMBA experts out there, but to be blunt, while SAMBA is a very powerful software package, the documentation and interfaces are lacking when it comes to doing anything even remotely complicated. I'm not even talking about a cute GUI, I'm talking about docs missing for simple use cases like: I'm a Linux client querying a WINS server.. how can I get normal programs to use the WINS server for name resolution? I'm not talking about using a specialized utility like nmblookup.. I want it to "just work" for normal programs.



The good news is that I found a partial solution after hunting around. Before beginning, make sure you have at least the client packages for Samba installed. I am using Arch Linux so your paths for config files may vary slightly in different distros.


  • First:edit /etc/samba/smb.conf and add IP address of your WINS server. For example: wins server = 172.16.42.1

  • Next: update a config file called "nsswitch.conf" I have been using Linux for 10 years and had never messed with this file before, but it basically allows you to tell different name resolution services how to try to resolve names. It goes way beyond the simple task of resolving host names to IP addresses that we address here, but for our purposes the fix is simple. Add an entry for "wins" to the hosts line like so:
    hosts: files dns wins


That's it for the basic configuration. The final entry in nsswitch.conf tells the name resolver to use WINS resolution last if files (e.g. /etc/hosts) or a standard DNS query cannot resolve a name. The configuration of the WINS server in smb.conf from step 1 ensures that there is a valid WINS server to query.



What does & does not work


So after the description of tweaking nsswitch.conf above, any program that is setup to use proper name resolution will automagically work with the WINS server in addition to the existing DNS setup! This includes (but is not limited to): ssh, ping, wget, CUPS (specifying a printer with a WINS name), konqueror & dolphin (smb:// protocol). Even a 2-line Python program can use WINS once you do the configuration:



import socket
socket.gethostbyname ("WINS_name_or_DNS_name_it_does_not_matter")



Unfortunately, some utilities and programs do not use /etc/nsswitch.conf properly. Some are network-specific utilities like "host" & nslookup that are specifically designed to use DNS. More notoriously... a certain web browser named after an ignited carnivorous quadruped also fails to resolve names properly. Some browsers that do work properly under Linux include Arora and Chromium if you like a Google-browser experience.



In summary: While Linux does have a very robust and flexible system for using different services to resolve names... not all software on Linux actually wants to do things the easy way. However, for the purposes of the LAN at work, I can now use WINS to resolve names. This is very useful not only to make it easier than typing in dotted-quads, but also because DHCP means those dotted-quads are not necessarily stable, while names are. I've already gotten network shares to mount, and I'm looking forward to getting my home PC setup even better than some of the local machines on our LAN while being secure at the same time.