The Buffer Pool: August 2011

Tuesday, August 30, 2011

MySQL: Fun with MySQL query logs

The MySQL "general query log" is very useful for situations where you're trying to figure out and improve or debug a large, complex app. The problem we often run into is that Hibernate logs, JBOSS logs, or other app-level logs miss queries or don't put constant data into queries, or the app itself has a bug in it that results in queries not being logged correctly or at all.

On the other hand, MySQL's "general query log" captures all queries sent to the server when the

general_log

system variable is set to 'ON'. A useful way to capture a particular application dialog is to run the app on a lightly-trafficked database instance, preferably an isolated test instance, enable the general log with

mysql> set global general_log = 'on';

quickly do your operation, and then disable the general log with

mysql> set global general_log = 'off';

After this, look at the log. It will have lines starting with a time, a connection ID, some sort of code indicating what happened, and the query text. Figure out which connection ID(s) are interesting, use "grep" on the log to get the lines belonging to the connection IDs of interest, and redirect this output into a SQL file. This can then be edited down to the actual queries (if you remember to put a semicolon at the end of every line), and you can make a simple script that can be run using mysql's shell-level tools.

These are very useful for debugging, benchmarking of the database accesses done by web or GUI-driven applications, or other SQL-based testing.

Sunday, August 28, 2011

PRIMARY KEYs in INNODB: Choose wisely

PRIMARY KEYs in InnoDB are the primary structure used to organize data in a table. This means the choice of the PRIMARY KEY has a direct impact on performance. And for big datasets, this performance choice can be enormous.

Consider a table with a primary search attribute such as "CITY", a secondary search attribute "RANK", and a third search attribute "DATE".

A simple "traditional" approach to this table would be something like

create table myinfo (city varchar(50),
rank float,
info_date timestamp,
id bigint,
primary key (id)
) engine=innodb;

create index lookup_index
on myinfo (city, rank, info_date);

InnoDB builds the primary table data store in a B-tree structure around "id", as it's the primary key. The index "index_lookup" contains index records for every record in the table, and the primary key of the record is stored as the "lookup key" for the index.

This may look OK at first glance, and will perform decently with up to a few million records. But consider how lookups on myinfo by a query like

select * from myinfo where city = 'San Jose' and rank between 5 and 10 and date > '2011-02-15';

are answered by MySQL:

1. First, the index B-tree is walked to find the records of interest in the index structure itself.
2. Now, for every record of interest, the entire "primary" B-tree is walked to fetch the actual record values.

This means that N+1 B-trees are walked for N result records.

Now consider the following change to the above table:

create table myinfo (city varchar(50),
rank float,
info_date timestamp,
id bigint,
primary key (city, rank, info_date, id)
) engine=innodb;
create index id_lookup on myinfo (id);

The primary key is now a four-column primary key, and since "id" is distinct, it satisfies the uniqueness requirements for primary keys. The above query now only has to walk a single B-tree to be completely answered. Note also that searches against CITY alone or CITY+RANK also benefit.

Let's plug in some numbers, and put 100M records into myinfo. Let's also say that an average search returns 5,000 records.

Schema 1: (Index lookup + Primary Key lookup from index):
Lg (100M) * 1 + 5000 * Lg (100M) = 132903 B-tree operations.

Schema 2: (Primary Key lookup only):
Lg(100M) * 1 = 26 B-tree operations. (Note that this single B-tree ingress operation will fetch 5K records)

So, for this query, Schema 2 is over 5,000 times faster than Schema 1. So, if Schema 2 is answered in a second, Schema 1 will take nearly two hours.

Note that we've played a bit of a trick here, and now lookups on "ID" are relatively expensive. But there are many situations where a table identifier is rarely or never looked up, but used as the primary key as "InnoDB needs a primary key".

See also Schema Design Matters

Development in a large-data environment

One of the biggest problems with application development in the context of "Big Data" is that the developer gets the database-interacting code to "work" in the developer's "playpen" database, but the code collapses when it's put into production. A related, but even more serious problem - since it won't be found as quickly - is code that works early, but has very bad degradation as the production database grows.

There's a couple approaches to this problem:

1. Have the typical "small database" for initial coding and debugging.
2. Have a large developer playpen database. It should be a large fraction of the size of the production database, or if the production database is small, it should contain contrived (but logically consistent) data that is a large fraction of the expected size of the production database.

Developers should unit-test against both databases before committing their changes.

Schema Design Matters

One of the big "black holes" in software engineering is a lack of education and discussion of good, large-scale schema design. Virtually every blog you'll see on databases and performance focuses on tunable parameters and other operational issues, with occasional discussion on particular application-level database features.

This is obviously important, but while "tuning" will improve db performance (and a badly tuned database won't perform well at all), in many applications, the performance home runs are in schema issues.

Good schema design is hard, and in many ways counter-intuitive. To design a good large-scale schema, you'll need knowledge of how the database engine organizes data - so an excessively "abstract" approach is a problem - and you need to know what the applications will do with the schema. Simply representing the data and allowing the composition of "correct" queries is not enough.

And worrying about database normalization is far more trouble than it's worth; a large-scale schema will have to have "search" structures and "fact" or "base storage" structures.

An implication of the popularity of tools like Hadoop has been a tendency to walk away from the problem of general schema design, and focus on purpose-built search approaches that are intended to answer particular inquiries. This is necessary in data mining, but hinders more than helps the design of good, large-scale applications.

Much of the focus on schema design has been on "getting the right answer", which is obviously the zeroth requirement of a useful app, but if the queries effectively never return, are they producing the right answer?

In one of my jobs, I was, in a sense, hired due to this problem. The application was collapsing under the weight of input data, and was imperiling the growth of the company. I was actually hired to create a custom data manager for this application. However, as I looked at the database schemas, I realized that, while they were functional and "correct" schemas that were competently done from the perspective of "good" schema design, they were simply not up to the task of fetching the data that was needed to solve the business problem. In particular, if there were more than a few million entities in the database, the data browser app couldn't answer the sort of pseudo-ad-hoc inquiries users wanted to execute quickly enough to be viable.

Since writing this custom data manager would take at least a year - and the performance issues were hindering the growth of the company - I decided to table the custom data manager and deeply investigate and re-architect the schema design. The schema I came up with borrowed ideas from data warehousing, columnar databases, and Hadoop-style search-specific structures. I also minimized dependence on relational joins in favor of star-schema approaches to searching.

As it turned out, the schema redesign has made the application fast enough that the custom data manager is unnecessary.

The redesigned schema allows the app to currently handle over 100,000,000 entities (and about 2 billion database records), can probably handle at least 10x more, and is far faster than it originally was even with a couple million entities. And since we didn't have to throw hardware and software at the problem, the company's operational costs are very low given how much data (and business) is involved. And since we used MySQL, software costs are minimal.

A focus of this blog will be discussion of good schema design ideas, as well as operational issues around maintaining a good "live" schema.

The Buffer Pool