The Buffer Pool

Wednesday, March 28, 2012

Partitioning perils, and loading Really Big MySQL tables

We discovered an interesting problem in our environment. Our schema and partitioning rules were originally designed for the single-customer version of our product. When we went to a multi-customer hosted SaaS product, the simplest approach was to change our ID generator to bake the customer ID into the lower bytes of all our IDs, and use the high bytes for the sequence part of the ID. This works well, and allowed our code and schema to go to a multi-customer environment easily.

But one interesting point that we missed was we partitioned some of our biggest tables using simple hashes on ID. This gave a nice, even distribution of data in partitions before we did the "cooked" version of our IDs, but since MySQL uses a simple "mod(value, n_partitions)" as the hash function for hashed partitions, and we also used "mod(id, customer_multiplier)" to grab the "customer ID" out of a general ID (and customer_multiplier is greater than n_partitions), we ended up effectively partitioning these tables by customer ID.

This wouldn't be so bad except our customers vary drastically in size, so this is a poor partition choice (and is why we didn't choose to explicitly use customer ID as a partition rule, even though it was discussed).

Soooo....

We get to rebuild some huge tables in our archive, using a new partition rule that strips off the customer ID portion of the ID before doing the partition hash. I've been doing experiments, and we'll go with a copy, catchup, and replace strategy. The "copy" bit can be done with the db and apps live, and takes a couple of days to do. The catchup bit needs to happen with the apps offline, but is fairly fast. Once this is done, we can rename the copy table over the existing table, and we've effectively done the "alter table" operation.

As for building the copy table, I found I got the fastest loads when doing a Create Table as Select with the Select part ordered by the partition rule itself:

create table copytab (cols)
engine=innodb, partition using hash(id div cust_multiplier) partitions 1024
as select cols
from source_tab order by mod(id div cust_multiplier, 1024), id;

This "order by" groups all the rows by partition, which results in each partition being built completely before the next partition is built, and the secondary ordering by ID causes each partition B-tree to be inserted in sort order. As for MySQL configuration parameters, I temporarily set sort_buffer_size to 8GB and lowered the buffer pool size a bit.

Saturday, March 17, 2012

Rant: would love some "big data, small cost" discussions

When I read the blogs and articles about Big Data, the subtext is always Big Money. Even though many of the tools themselves are open-source, they all seem to require Big Infrastructure and a horde of lavishly-paid consultants and brainiacs to get deployed. It's hard to read a "big data" story about anywhere where people appeared to pay attention to how much things cost.

This is rather frustrating. While it's interesting in a sort of Hollywood-gossip way to read about how Megacorp or a multibillion-dollar "startup" deployed a zillion-rack Hadoop server farm, cooled by Icelandic glaciers, and baby-sat by a thousand people all over the world, in order to mine fascinating new ways to better separate customers from their cash, it doesn't help me much here in reality-land. Here, I'm "the guy", and we have some money to play with, but not an enormous amount - and we watch every penny.

Fortunately, I have a few tricks up my sleeve, and I'd love to learn more. But I wish the database world would lose its fascination with big-data porn and have some real life examples of how people are solving big-data problems with real-life budgets and personnel constraints.

Thursday, March 1, 2012

Why Functional Languages don't catch on in industry

One love of much of the programming world, particularly in academia, is functional languages. In my experience, functional languages such as LISP are great for a single excellent programmer, who writes all the code and deeply understands it, but work less well when a team of developers is trying to understand the code.

The problem with functional languages in industry is simple: you can't simply look at the text of the code and figure out what it's doing. Getting printouts of a functional language, particularly if it has dynamic binding and "LISP eval" style operations (as nearly all of them do) is pretty much useless. In order to unravel the mysteries of the app, you have to run the code on many types of input, carefully trace execution in some sort of interactive environment (such as a debugger, prompt, etc), and monitor what the code is doing. Given this, the learning curve for a new programmer on an existing large body of code in a functional language is extremely steep and often insurmountable, especially if the original developer is not around.

In other words, functional languages are extremely poor at self-documentation, and few of the "lone wolf" developers of the sort who shine with functional languages are the sort to write extensive documentation.

I've seen several cases where a big (and well-functioning) app was coded in LISP by a single "rockstar" developer. The guy leaves, nobody can be hired who understands the code, other in-house developers can't maintain the code, so the app ends up being recoded completely in a new language (in one case it was C, another time C++). In fact, the original Postgres top-level was implemented in LISP - it was recoded to C in 1988.

Wednesday, February 29, 2012

The case for a new open-source relational database

I've been thinking about a new open-source relational database for a while, and here's my thoughts as to why the world needs one, even though PostgreSQL and MySQL would seem to have a lock, with SQLite doing well in small devices.

First, some strengths and weaknesses of each:

1. PostgreSQL is very good at query optimization, and has a large and powerful set of query answering facilities. Where it isn't so wonderful is its storage engine, where the old "versioned" storage manager, complete with its vacuum cleaner - which sucked in 1990 when I worked on it and is still a operational PITA - still resides. It "works", and is serviceable for many applications (particularly large archival apps, where deletes and updates are done rarely or never), but is a huge problem for always-up OLTP apps.

2. OTOH, MySQL has the opposite problem: its parser and executor are lame (gee, will they ever figure out how to cache constant subquery results? How about hashjoins? Type checking? Not defaulting to friggin case-insensitive Swedish as the default collation?), but its open storage architecture allows lots of interesting storage engines, and InnoDB's index-organized structure kicks butt if you know how to use it. We use MySQL/InnoDB for a very large, SaaS database ourselves with both heavy OLTP components and heavy archival search components. To get around its sucky top-level, I have to carefully police all significant queries, make extensive use of optimizer hints, and work with developers to make certain they aren't doing something problematic like using a subquery that is always treated as correlated, etc.

In a dream-world, I could figure out a way to hook up PostgreSQL's top level with the InnoDB storage engine. That would be a powerful relational engine.

That said, there are things neither engine, and no relational db engine in the open-source world that I'm aware of, do very well:

1. Better handling of in-memory tables and in-memory indexes. One thing I did that was very popular in DeviceSQL was an efficient in-memory hashed and ordered index on a persistent table. It is very popular with customers and was extremely fast.
2. Fast bootup, even after ungraceful shutdown. Given that Serious Databases increasingly exist on small, portable devices, fast bootup is needed. Fast bootup has all sorts of implications for transaction handling, so it has to be "designed in" from the beginning.
3. An open storage architecture is a must, both for base storage and for indexes (and the open-storage architecture should allow index-organized base storage as well).
4. "Vector" index query answering facilities. One of the biggest limitations on relational database execution engines today is the inability to use multiple indexes on the same table. A "vector index strategy" would allow large tables to be searched by multiple parameters without needing to "pick a winner" (and often there isn't any good choices to pick...)

More later...

Saturday, October 29, 2011

MySQL, Postgres, and InnoDB: Some Critiques of Index Organized Tables

This discussion of InnoDB versus Postgres' storage engine (and index organized table structures generally, as this could also apply to Oracle's IOT's) is worth a careful read. Since our schema makes heavy use of the index organized property of InnoDB - and since I have a bit of a fanboy wish to use Postgres as I worked on it for a few years versus helping to buy Larry another yacht - here's some discussion of where index organized structures add value and some criticisms of my own.

Even though one used a primary key definition to define the "index organization structure", in my opinion, the real power of index organized table structures is in larger, "bulk" searches over single-record lookups of the sort one would typically do with a primary key. Assuming very careful schema design and querying, the advantage with "bulk" searches is that a single B-tree ingress is all that is necessary to answer the search, while secondary indexes require at least some indirection into the table, which gets expensive if the working set involves millions of records (as it often does, at least in our case). See the below discussion on telemetry schemas for more.

Index Organized Table Wishlist

Have an explicit "ORGANIZATION KEY" table declaration (or some such) and not rely on the PRIMARY KEY. If the organization key doesn't completely define the storage, allow "..." or something to allow for an incompletely defined organization key. If "ORGANIZATION KEY" is omitted, default to the PRIMARY KEY.
Secondary indexes should have the option of using storage row-ids to "point at" the main rows as organization keys can be relatively long, and require walking two different B-trees to find the recs. Implementing this could be hard and make inserts slow as otherwise unchanged rows can "move around" while the B-tree is having stuff done to it - requiring secondary indexes to have pointers redone - so there's a design trade-off here. In our environment, we mostly avoid secondary indexes in favor of "search tables", which are basically user-managed indexes.

If I get the energy to do an open-source project, I want to do what I call "vector indexes" and a "vectoring execution engine" that we've basically implemented in user code in our app. If we had "vector indexes", we wouldn't be so reliant on index-organized table structures.

Wednesday, October 26, 2011

The joy of MySQL stored procedures

MySQL stored procedures, or stored functions, can be useful, particularly if you need to do something "iterative" with data that can't be easily done with "simple" SQL such as looping through data or (simple) XML or name-value-pair string parsing. First, a few things to know:

1. MySQL stored procedures are dynamically interpreted, so you won't hit syntax or semantic errors in logical blocks until they're actually executed, so you need to create situations in unit-testing where all logic is executed.

2. If stored procedure A calls stored procedure B, and B gets redefined, A will hang unless it gets redefined after B is redefined.

3. Local variables used with the "cursor FETCH" statement shouldn't have the same names as columns used in FETCH queries. If they do, they'll be NULL and random badness will ensue.

Debugging stored procedures

As far as I know, there's no source-level debugger available for MySQL stored procedures or functions, so you have to be a bit clever and a bit old-school in debugging.

The easiest way to debug stored procedures is to use a variation of the "printf" strategy from programming days of yore. Since you can't print an actual message, an alternate way is to create a "debug_output" table with a "debug_txt varchar(128)" column, and use

insert into debug_output values (concat('my message', ',', 'var1=', var1, ..., 'varN=', varN));

After your procedure runs (or dies), just select from your debug_output table. I also like to create an error_output table for error handling with a single column.

Note that these will greatly impact runtime in tightly coded stored procedures, so you'll want to comment them out. I tried to disable them by using the "BLACKHOLE" storage manager, but this doesn't improve performance significantly.

Saturday, October 8, 2011

MySQL strings: oddness with backslash and LIKE

Our application supports wildcard searches of the data, and the data occasionally includes LIKE "special" characters, such as '%', '_', and '\'. One thing we learned recently is that escaping a backslash, which is the LIKE default "escape character" in a LIKE query is particularly troublesome.

Here's some examples. Consider the below simple table:

> insert into foo (a) values ('\\');
> insert into foo (a) values ('\\%');

> select * from foo \G
a: '\'
a: '\%'

Now, do a LIKE search of a single backslash character without a wildcard. To get it searched on, you have to add THREE backslashes to the string, and need a total of four backslashes in the string:

> select * from foo where a like '\\\\' \G
a: '\'

To do a LIKE search of a single backslash and a single "escaped" percent sign, you need six backslashes: four to embed one in the data, and two to escape the percent sign:

> select * from foo where a like '\\\\\\%' \G
a: '\%'

Note that if you use simple equality, you need two backslashes to search on a single backslash.