[parted-devel] [rfc] SSD partition alignment
Matt Domsch
Matt_Domsch at dell.com
Sun Feb 22 18:09:18 UTC 2009
On Sun, Feb 22, 2009 at 01:27:40PM +0000, Daniel J Blueman wrote:
> On Sun, Feb 22, 2009 at 11:39 AM, Jim Meyering <jim at meyering.net> wrote:
> > Colin Watson wrote:
> >> On Sun, Feb 22, 2009 at 01:40:16AM +0100, Jim Meyering wrote:
> >>> Daniel J Blueman wrote:
> >>> > I've checked into this, and since libparted sees the SATA block device
> >>> > as SCSI, it doesn't perform the expected ATA 'identify' command to
> >>> > fill out the 512 bytes of device info, of which (short) word 217 is
> >>> > device RPM, defined to be 1 on newer compliant SSDs. The kernel uses
> >>> > this word to detect if a device is an SSD or not, so I suggest we use
> >>> > the same.
> >>> >
> >>> > Anyone think of objections to calling the ATA identify ioctl to fill
> >>> > out the structure, then storing this flat for later use in constraint
> >>> > checking? If the SCSI device supports it also, fine, else nothing
> >>> > lost.
> >>> >
> >>> > For now, a 1MB starting offset for an SSD seems safest, and is what MS
> >>> > Windows 7 and Server 2008 use, thus a number of vendors will also be
> >>> > testing/optimising with this case too.
> >>>
> >>> Does this really need to be SSD-specific?
> >>>
> >>> I hear that this (alignment) is high priority also for many
> >>> of the big new disks, since they have 4k-byte sectors.
> >>> Without better alignment, their performance will suffer, too.
> >>
> >> Well, one step at a time. We can detect SSD; can we detect those big new
> >> disks (or, in general, the desired sector size)?
> >
> > Alignment-related changes that are useful for SSDs will also benefit
> > other hardware, so we'd be remiss not to consider that up-front.
>
> > I think we agree:
> > Writing a "device-is-an-SSD" function that queries the kernel, and
> > then having some caller use that to look up reasonable-for-SSD
> > alignment parameters would be useful. Just make it general
> > enough to also work with e.g., a "device-has-4kb-sector" function.
> >
> > However, I'm beginning to wonder if that'd be more appropriate
> > at a higher level than parted, like in gparted. More below.
>
> For the next few years of users having current SSDs, RAID arrays etc,
> the kernel detection isn't working out and won't. Kernel developers
> disabled Compact Flash cards being marked as non-rotational also,
> since a number of microdrives are CFA devices. I've read the SSDs in
> most netbooks also don't report rotational RPM (word 217) as 1, and of
> course are most performance sensitive (being slower), so we need to
> consider this.
The Linux kernel SD/MMC drivers have started tagging devices as
blk_queue_nonrot()==1 as well as other non-rotational drivers such as
nbd, in addition to the libata test for word 217 == 0x01. I don't
believe this is being exported from the kernel (yet).
>
> >> Or are you saying that we should increase the alignment to 4KB in
> >> general?
>
> 4KB, 128KB or 1MB, but I think 128KB is enough for the erase block
> alignment, and will help most RAID arrays I believe. I've found
> misaligned striping on eg RAID-5 arrays can hurt with certain
> workloads.
There are ATA inquiry fields defined now to expose more
characteristics of the underlying disk, such as physical sector size,
logical sector size, alignment of the underlying block (even alignment
means LBA 0 and Physical Block 0 both start at the same boundary; odd
alignment means LBA 0 may be "offset" into Physical Block 0 by some
amount, such that partitions starting at LBA 63 sits at the beginning
of the underlying Physical Block). As these are per-disk specific, if
parted is going to use this information to make alignment decisions,
parted will need to query either the disk or the kernel (once the
kernel exposes it somehow).
Windows XP still puts MS-DOS partition 1 to start at LBA 63, hence
drive manufacturers have devices that optimize for that case. Vista
(per Microsoft KB articles) aligns partitions on 1MB boundaries (which
was the expected case for hard drive manufacturers to align on).
I'm not sure if the size of the erase block is exposed by drive
firmware yet, but if it is, the kernel will need to know that value,
even more so than parted. We should encourage the kernel to expose
such discovered information in a manner parted can easily get at it as
well.
And yes, sorry I've been out of touch for a few weeks. Life
happens. :-)
--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
More information about the parted-devel
mailing list