If you're using lots of partitions, you'll want to make sure that partition pruning is used, and you aren't always guaranteed that it will be. We had a situation where we have a set of very large (potentially >1B rows, currently ~250M rows) tables that are partitioned on "Unix days" (Unix time rounded to days), that have several hundred partitions, using "hash" partitioning to simplify maintenance. So, to simplify, our table looks something like
create table mytable (id integer, unix_day mediumint, primary key (id, unix_day**)) engine=innodb partition by hash (unix_day) partitions 500;
Let's load up a with 1B records. Assuming even distribution, we'll have 2M records in each partition. Now, let's run the following query, to fetch everything in the past 3 days (ignoring timezones for simplicity):
set @today = unix_timestamp(now()) div 86400;
select id from mytable
where unix_day >= @today - 3;
Since we're using "hash" partitioning, this will end up being a monster table scan on the entire table, taking hours to complete. The problem is this query can't be "pruned" because "hash" partitioning doesn't have an implied order.
However, if we run this query
select id from mytable where unix_day in (@today-3, @today - 2, @today - 1, @today);
we'll get good pruning and only four partitions will be searched. Note that MySQL is also good at exploding small integer ranges (ie, ranges less than the number of partitions) into what amounts to an IN-list for partition pruning purposes, so the above query, which is difficult to code in an app, can be converted to
select id from mytable where unix_day between @today - 3 and @today;
and only four partitions will be searched.
**Note that this "stupid" PRIMARY KEY declaration is necessary to have InnoDB let us use unix_day as a partition key, as partition keys must be part of an existing key (primary or otherwise). Making it a separate KEY - and additional index - would be hugely expensive and unnecessary in our app.
No comments:
Post a Comment