[debian-mysql] Bug#591271: Explanation and fix
Trey Raymond
aradapilot at gmail.com
Fri Mar 9 21:43:31 UTC 2012
So this is what's happening here. I'll explain what it's doing for each
query, then show you the optimal way.
SELECT MAX(sample.id) FROM sample JOIN interface ON (sample.interface=
interface.id) WHERE interface.name='accounting-total'
-- This is just a symptom of the optimizer being stupid. It starts looking
at the keys for the table, sees the first one, and notices that it is a
covering index - and therefore it choses to use it without considering that
there might be a better one. It is a covering index because in innodb, the
primary key is appended to every secondary key, so KEY
(`interface`,`timestamp`) actually stores
(`interface`,`timestamp`,`id`)...also note that where you have KEY
(`interface`,`id`), you really only need to say KEY (`interface`), for this
same reason. The problem is that it's caring so much about using a key to
filter that it doesn't notice there's one that could be used to filter AND
get the max(id) easily. That's just the mysql optimizer for you. Newer
versions tend to be a bit smarter, but you'll always see this kind of
thing. I bet if you switched what order those keys were in in the file,
it'd find the correct one first and use it.
SELECT MAX(sample.id) FROM sample WHERE interface=(SELECT interface.id FROM
interface WHERE interface.name='accounting-total')
-- This one is using a one-time (independent) subquery, or "derived table."
As it only has to execute that statement once, it's very fast. This is
basically what mysql would do if the optimizer handled it right, but
without join syntax optimization, so it's still not perfect. It's the
closest you can get without index hints or changing the indexing.
SELECT MAX(sample.id) FROM sample WHERE interface IN (SELECT
interface.idFROM interface WHERE
interface.name='accounting-total')
-- This is a BIG no-no. It's using a dependent subquery - note how saying
IN implies multiple values could be returned and a range check is needed.
It also implies that the subquery cannot be solved before the main table
is accessed, so no "derived table" use. In this case, it has to scan the
ENTIRE key on the sample table, and for EVERY record it must run that
subquery - you're basically executing over 6 million separate queries in
this one statement. This is the slowest method. Beware when you see
"Dependent subquery" in an explain plan - there is usually a better way to
go about it.
So in conclusion, the best way to fix it is to just tell the optimizer to
use the correct index. Either of the following two should work:
SELECT MAX(sample.id) FROM sample FORCE INDEX (sample_interface_id) WHERE
interface=(SELECT interface.id FROM interface WHERE interface.name
='accounting-total')
-- explicitly specifying the key
SELECT MAX(sample.id) FROM sample IGNORE INDEX (sample_timestamp) WHERE
interface=(SELECT interface.id FROM interface WHERE interface.name
='accounting-total')
-- telling it to skip that first key and find the next usable one
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-mysql-maint/attachments/20120309/4cb32fdd/attachment.html>
More information about the pkg-mysql-maint
mailing list