Wednesday, March 28, 2012

Partitioning perils, and loading Really Big MySQL tables

We discovered an interesting problem in our environment. Our schema and partitioning rules were originally designed for the single-customer version of our product. When we went to a multi-customer hosted SaaS product, the simplest approach was to change our ID generator to bake the customer ID into the lower bytes of all our IDs, and use the high bytes for the sequence part of the ID. This works well, and allowed our code and schema to go to a multi-customer environment easily.

But one interesting point that we missed was we partitioned some of our biggest tables using simple hashes on ID. This gave a nice, even distribution of data in partitions before we did the "cooked" version of our IDs, but since MySQL uses a simple "mod(value, n_partitions)" as the hash function for hashed partitions, and we also used "mod(id, customer_multiplier)" to grab the "customer ID" out of a general ID (and customer_multiplier is greater than n_partitions), we ended up effectively partitioning these tables by customer ID.

This wouldn't be so bad except our customers vary drastically in size, so this is a poor partition choice (and is why we didn't choose to explicitly use customer ID as a partition rule, even though it was discussed).


We get to rebuild some huge tables in our archive, using a new partition rule that strips off the customer ID portion of the ID before doing the partition hash. I've been doing experiments, and we'll go with a copy, catchup, and replace strategy. The "copy" bit can be done with the db and apps live, and takes a couple of days to do. The catchup bit needs to happen with the apps offline, but is fairly fast. Once this is done, we can rename the copy table over the existing table, and we've effectively done the "alter table" operation.

As for building the copy table, I found I got the fastest loads when doing a Create Table as Select with the Select part ordered by the partition rule itself:

create table copytab (cols)
engine=innodb, partition using hash(id div cust_multiplier) partitions 1024
as select cols
from source_tab order by mod(id div cust_multiplier, 1024), id;

This "order by" groups all the rows by partition, which results in each partition being built completely before the next partition is built, and the secondary ordering by ID causes each partition B-tree to be inserted in sort order. As for MySQL configuration parameters, I temporarily set sort_buffer_size to 8GB and lowered the buffer pool size a bit.

Saturday, March 17, 2012

Rant: would love some "big data, small cost" discussions

When I read the blogs and articles about Big Data, the subtext is always Big Money. Even though many of the tools themselves are open-source, they all seem to require Big Infrastructure and a horde of lavishly-paid consultants and brainiacs to get deployed. It's hard to read a "big data" story about anywhere where people appeared to pay attention to how much things cost.

This is rather frustrating. While it's interesting in a sort of Hollywood-gossip way to read about how Megacorp or a multibillion-dollar "startup" deployed a zillion-rack Hadoop server farm, cooled by Icelandic glaciers, and baby-sat by a thousand people all over the world, in order to mine fascinating new ways to better separate customers from their cash, it doesn't help me much here in reality-land. Here, I'm "the guy", and we have some money to play with, but not an enormous amount - and we watch every penny.

Fortunately, I have a few tricks up my sleeve, and I'd love to learn more. But I wish the database world would lose its fascination with big-data porn and have some real life examples of how people are solving big-data problems with real-life budgets and personnel constraints.

Thursday, March 1, 2012

Why Functional Languages don't catch on in industry

One love of much of the programming world, particularly in academia, is functional languages. In my experience, functional languages such as LISP are great for a single excellent programmer, who writes all the code and deeply understands it, but work less well when a team of developers is trying to understand the code.

The problem with functional languages in industry is simple: you can't simply look at the text of the code and figure out what it's doing. Getting printouts of a functional language, particularly if it has dynamic binding and "LISP eval" style operations (as nearly all of them do) is pretty much useless. In order to unravel the mysteries of the app, you have to run the code on many types of input, carefully trace execution in some sort of interactive environment (such as a debugger, prompt, etc), and monitor what the code is doing. Given this, the learning curve for a new programmer on an existing large body of code in a functional language is extremely steep and often insurmountable, especially if the original developer is not around.

In other words, functional languages are extremely poor at self-documentation, and few of the "lone wolf" developers of the sort who shine with functional languages are the sort to write extensive documentation.

I've seen several cases where a big (and well-functioning) app was coded in LISP by a single "rockstar" developer. The guy leaves, nobody can be hired who understands the code, other in-house developers can't maintain the code, so the app ends up being recoded completely in a new language (in one case it was C, another time C++). In fact, the original Postgres top-level was implemented in LISP - it was recoded to C in 1988.

Wednesday, February 29, 2012

The case for a new open-source relational database

I've been thinking about a new open-source relational database for a while, and here's my thoughts as to why the world needs one, even though PostgreSQL and MySQL would seem to have a lock, with SQLite doing well in small devices.

First, some strengths and weaknesses of each:

1. PostgreSQL is very good at query optimization, and has a large and powerful set of query answering facilities. Where it isn't so wonderful is its storage engine, where the old "versioned" storage manager, complete with its vacuum cleaner - which sucked in 1990 when I worked on it and is still a operational PITA - still resides. It "works", and is serviceable for many applications (particularly large archival apps, where deletes and updates are done rarely or never), but is a huge problem for always-up OLTP apps.

2. OTOH, MySQL has the opposite problem: its parser and executor are lame (gee, will they ever figure out how to cache constant subquery results? How about hashjoins? Type checking? Not defaulting to friggin case-insensitive Swedish as the default collation?), but its open storage architecture allows lots of interesting storage engines, and InnoDB's index-organized structure kicks butt if you know how to use it. We use MySQL/InnoDB for a very large, SaaS database ourselves with both heavy OLTP components and heavy archival search components. To get around its sucky top-level, I have to carefully police all significant queries, make extensive use of optimizer hints, and work with developers to make certain they aren't doing something problematic like using a subquery that is always treated as correlated, etc.

In a dream-world, I could figure out a way to hook up PostgreSQL's top level with the InnoDB storage engine. That would be a powerful relational engine.

That said, there are things neither engine, and no relational db engine in the open-source world that I'm aware of, do very well:

1. Better handling of in-memory tables and in-memory indexes. One thing I did that was very popular in DeviceSQL was an efficient in-memory hashed and ordered index on a persistent table. It is very popular with customers and was extremely fast.
2. Fast bootup, even after ungraceful shutdown. Given that Serious Databases increasingly exist on small, portable devices, fast bootup is needed. Fast bootup has all sorts of implications for transaction handling, so it has to be "designed in" from the beginning.
3. An open storage architecture is a must, both for base storage and for indexes (and the open-storage architecture should allow index-organized base storage as well).
4. "Vector" index query answering facilities. One of the biggest limitations on relational database execution engines today is the inability to use multiple indexes on the same table. A "vector index strategy" would allow large tables to be searched by multiple parameters without needing to "pick a winner" (and often there isn't any good choices to pick...)

More later...