Tuesday, September 6, 2011

INNODB: When do partitions improve performance?

PARTITIONs are one of the least understood tools in the database design toolkit.

As an index-organized storage mechanism, InnoDB's partitioning scheme can be extremely helpful for both insert/load performance and read performance in queries that leverage the partition key.

In InnoDB, the best way to think of a partitioned table is as a hash table with separate B-trees in the individual partitions, with the individual B-trees structured around the PRIMARY KEY (as is always the case with InnoDB-managed tables). The partition key is the "bucketizer" for the hash table.

Since the main performance issue in loading data into an InnoDB table is the need to figure out where to put new records in an existing B-Tree structure, a partition scheme that allows new data to go into "new" partitions, with "short" B-trees, will dramatically speed up performance. Also, the slowest loads will be bounded by the maximum size of an individual partition.

A partition scheme I generally like to use is one with several hundred partitions, driven by "Unix days" (ie, Unix time divided by 86400 seconds). Since data is typically obsoleted or archived away after a couple of years, this will allow at most a couple of days of data in a partition.

The partition key needs to be explicitly used in the WHERE clauses of queries in order to take advantage of partition pruning. Note that even if you can figure out how to derive the partition key from other data in the WHERE clause, the query optimizer often can't (or at least isn't coded to do so), so it's better to explicitly include it in the WHERE clause by itself, ANDed with the rest of the WHERE.

Some notes on using MySQL partitions:

1. Secondary indexes won't typically go much faster. Shorter B-trees will help some, but not all that much, particularly since it's hard to do both partition pruning and use a secondary index against the same table in a single query.
2. In MySQL, don't create a secondary index on the partition key, as it will effectively "hide" the partition key from MySQL and result in worse performance than if the partition is directly used.
3. Partitions can't be easily "nested". While you can use a computed expression to map your partition in more recent versions of MySQL, it is difficult for MySQL to leverage one well for the purposes of partition pruning. So, keeping your partitions simple and just using them directly in queries is preferable.
4. If you use "Unix day" as a partition, you'll want to explicitly include it in a query, even if you already have a datetime value in the query and the table. This may make queries look slightly ugly and redundant, but they'll run a lot faster.
5. The biggest "win" in searching from partitioning will be searches on the PRIMARY KEY that also use the partition key, allowing for partition pruning. Also, searches that otherwise would result in a table scan can at least do a "short" table scan only in the partitions of interest if the partition key is used in the query.

No comments: