Wednesday, March 28, 2012

Partitioning perils, and loading Really Big MySQL tables

We discovered an interesting problem in our environment. Our schema and partitioning rules were originally designed for the single-customer version of our product. When we went to a multi-customer hosted SaaS product, the simplest approach was to change our ID generator to bake the customer ID into the lower bytes of all our IDs, and use the high bytes for the sequence part of the ID. This works well, and allowed our code and schema to go to a multi-customer environment easily.

But one interesting point that we missed was we partitioned some of our biggest tables using simple hashes on ID. This gave a nice, even distribution of data in partitions before we did the "cooked" version of our IDs, but since MySQL uses a simple "mod(value, n_partitions)" as the hash function for hashed partitions, and we also used "mod(id, customer_multiplier)" to grab the "customer ID" out of a general ID (and customer_multiplier is greater than n_partitions), we ended up effectively partitioning these tables by customer ID.

This wouldn't be so bad except our customers vary drastically in size, so this is a poor partition choice (and is why we didn't choose to explicitly use customer ID as a partition rule, even though it was discussed).

Soooo....

We get to rebuild some huge tables in our archive, using a new partition rule that strips off the customer ID portion of the ID before doing the partition hash. I've been doing experiments, and we'll go with a copy, catchup, and replace strategy. The "copy" bit can be done with the db and apps live, and takes a couple of days to do. The catchup bit needs to happen with the apps offline, but is fairly fast. Once this is done, we can rename the copy table over the existing table, and we've effectively done the "alter table" operation.

As for building the copy table, I found I got the fastest loads when doing a Create Table as Select with the Select part ordered by the partition rule itself:

create table copytab (cols)
engine=innodb, partition using hash(id div cust_multiplier) partitions 1024
as select cols
from source_tab order by mod(id div cust_multiplier, 1024), id;

This "order by" groups all the rows by partition, which results in each partition being built completely before the next partition is built, and the secondary ordering by ID causes each partition B-tree to be inserted in sort order. As for MySQL configuration parameters, I temporarily set sort_buffer_size to 8GB and lowered the buffer pool size a bit.

No comments: