The Buffer Pool: September 2011

Thursday, September 22, 2011

Partition pruning and WHERE clauses

If you're using lots of partitions, you'll want to make sure that partition pruning is used, and you aren't always guaranteed that it will be. We had a situation where we have a set of very large (potentially >1B rows, currently ~250M rows) tables that are partitioned on "Unix days" (Unix time rounded to days), that have several hundred partitions, using "hash" partitioning to simplify maintenance. So, to simplify, our table looks something like

create table mytable (id integer, unix_day mediumint, primary key (id, unix_day**)) engine=innodb partition by hash (unix_day) partitions 500;

Let's load up a with 1B records. Assuming even distribution, we'll have 2M records in each partition. Now, let's run the following query, to fetch everything in the past 3 days (ignoring timezones for simplicity):

set @today = unix_timestamp(now()) div 86400;

select id from mytable
where unix_day >= @today - 3;

Since we're using "hash" partitioning, this will end up being a monster table scan on the entire table, taking hours to complete. The problem is this query can't be "pruned" because "hash" partitioning doesn't have an implied order.

However, if we run this query

select id from mytable where unix_day in (@today-3, @today - 2, @today - 1, @today);

we'll get good pruning and only four partitions will be searched. Note that MySQL is also good at exploding small integer ranges (ie, ranges less than the number of partitions) into what amounts to an IN-list for partition pruning purposes, so the above query, which is difficult to code in an app, can be converted to

select id from mytable where unix_day between @today - 3 and @today;

and only four partitions will be searched.

**Note that this "stupid" PRIMARY KEY declaration is necessary to have InnoDB let us use unix_day as a partition key, as partition keys must be part of an existing key (primary or otherwise). Making it a separate KEY - and additional index - would be hugely expensive and unnecessary in our app.

Wednesday, September 21, 2011

InnoDB versus MyISAM: large load performance

The conventional wisdom regarding InnoDB versus MyISAM is that InnoDB is faster in some contexts, but MyISAM is generally faster, particularly in large loads. However, we ran an experiment in which a large bulk load of a mysqldump output file, which is basically plain SQL consisting of some CREATE TABLE and CREATE INDEX statements, and a whole lot of huge INSERT statements, in which InnoDB was the clear winner over MyISAM.

We have a medium-sized database that we have to load every month or so. When loaded, the database is about 6GB in MyISAM, and about 11G in InnoDB, and has a couple hundred million smallish records. The MySQL dump file itself is about 5.6G, and had "ENGINE=MyISAM" in its CREATE TABLE statements that we "sed" substituted to "ENGINE=InnoDB" to do the InnoDB test.

Load time with ENGINE=MyISAM: 203 minutes (3 hours 23 minutes)
Load time with ENGINE=InnoDB: 40 minutes

My guess is that the performance difference is due to the fact that these tables have lots of indexes. Every table has a PRIMARY KEY, and at least one secondary index. InnoDB is generally better at large index loads than MyISAM in our experience, so the extra time MyISAM spends doing index population swamps its advantage in simple load time to the base table storage.

Given our experimental results, we'll now use InnoDB for this table.

Saturday, September 10, 2011

Big Data: "Telemetry" schemas

Many database schemas have similar characteristics, and one common - and important - schema type is what I call the "telemetry" schema. Telemetry schemas have certain things in common:

1. They have a small number of relatively simple tables. I'll call these the "telemetry log tables".
2. They have a time component, and are often queried using the time component.
3. They have at least one score or amount associated with the data, which is also queried frequently.
4. Telemetry log tables are often very large, with hundreds of millions or billions of records.
5. While there may be summary tables that are updated frequently, records in the telemetry log tables are rarely or never updated.

Examples of telemetry schemas include:

1. Actual telemetry data from various types of sensors.
2. Wall street-style trading data.
3. Other transactional data, such as banking activity, point-of-sale activity, etc.

As mentioned above, telemetry schemas often have a time series component, but also need to be queried in interesting ways using approaches other than the simple time component.

Telemetry schemas pose several challenges to standard schema design strategies:

1. Telemetry log tables are typically too large for relational joins, even when well-indexed, to perform well, particularly on "bulk" searches that visit many records.
2. Most database engines will "anchor" a search on one index, use it to fetch records out of the base table, and finish qualifying the search using values from these records, even if other indexes exist on the qualifying rows. This search strategy will perform awfully with huge tables, unless the engine gets lucky and picks a highly selective index, or the search happens to be on a very selective value in the index. Statistics-based query optimizers can help some, but can still "whiff badly", and this can result in what I call the database "death penalty" for queries: a badly-optimized "loser" query that takes days to run.

In a future blog post, I'll talk about a search strategy I've successfully used for a large telemetry schemas.

Tuesday, September 6, 2011

INNODB: When do partitions improve performance?

PARTITIONs are one of the least understood tools in the database design toolkit.

As an index-organized storage mechanism, InnoDB's partitioning scheme can be extremely helpful for both insert/load performance and read performance in queries that leverage the partition key.

In InnoDB, the best way to think of a partitioned table is as a hash table with separate B-trees in the individual partitions, with the individual B-trees structured around the PRIMARY KEY (as is always the case with InnoDB-managed tables). The partition key is the "bucketizer" for the hash table.

Since the main performance issue in loading data into an InnoDB table is the need to figure out where to put new records in an existing B-Tree structure, a partition scheme that allows new data to go into "new" partitions, with "short" B-trees, will dramatically speed up performance. Also, the slowest loads will be bounded by the maximum size of an individual partition.

A partition scheme I generally like to use is one with several hundred partitions, driven by "Unix days" (ie, Unix time divided by 86400 seconds). Since data is typically obsoleted or archived away after a couple of years, this will allow at most a couple of days of data in a partition.

The partition key needs to be explicitly used in the WHERE clauses of queries in order to take advantage of partition pruning. Note that even if you can figure out how to derive the partition key from other data in the WHERE clause, the query optimizer often can't (or at least isn't coded to do so), so it's better to explicitly include it in the WHERE clause by itself, ANDed with the rest of the WHERE.

Some notes on using MySQL partitions:

1. Secondary indexes won't typically go much faster. Shorter B-trees will help some, but not all that much, particularly since it's hard to do both partition pruning and use a secondary index against the same table in a single query.
2. In MySQL, don't create a secondary index on the partition key, as it will effectively "hide" the partition key from MySQL and result in worse performance than if the partition is directly used.
3. Partitions can't be easily "nested". While you can use a computed expression to map your partition in more recent versions of MySQL, it is difficult for MySQL to leverage one well for the purposes of partition pruning. So, keeping your partitions simple and just using them directly in queries is preferable.
4. If you use "Unix day" as a partition, you'll want to explicitly include it in a query, even if you already have a datetime value in the query and the table. This may make queries look slightly ugly and redundant, but they'll run a lot faster.
5. The biggest "win" in searching from partitioning will be searches on the PRIMARY KEY that also use the partition key, allowing for partition pruning. Also, searches that otherwise would result in a table scan can at least do a "short" table scan only in the partitions of interest if the partition key is used in the query.

Saturday, September 3, 2011

MySQL 5.1 to 5.5 upgrade

We are in the midst of a MySQL 5.1 to MySQL 5.5 upgrade. Nobody upgrades the database engine on a running production system without a Darn Good Reason, but we had recently run into one with a bug in MySQL 5.1 that was exploding MySQL RAM use on our production server and threatening to crash the production database.

Ultimately, the problem was we wanted to use "statement batching" with UPDATE, which produces a large number of UPDATE statements in a single long string, dramatically improving performance. After a dialog with MySQL's QA people, it turned out that we were hitting a bug in 5.1's query parser in which a lot of RAM was allocated for every statement, and not freed until the entire set of queries in the query text was parsed. So, this wasn't, strictly speaking, a "memory leak", but still an open-ended memory allocation which caused the RAM assigned to MySQL to grow enormously.

5.5 appears to have solved this problem, and we're well along in our acceptance process. Other than a couple of very small issues with syntax that was deprecated in 5.1, but now an actual error in 5.5, all is going well. We'll go live shortly.

The Buffer Pool