Sunday, August 28, 2011

PRIMARY KEYs in INNODB: Choose wisely

PRIMARY KEYs in InnoDB are the primary structure used to organize data in a table. This means the choice of the PRIMARY KEY has a direct impact on performance. And for big datasets, this performance choice can be enormous.

Consider a table with a primary search attribute such as "CITY", a secondary search attribute "RANK", and a third search attribute "DATE".

A simple "traditional" approach to this table would be something like

create table myinfo (city varchar(50),
rank float,
info_date timestamp,
id bigint,
primary key (id)
) engine=innodb;


create index lookup_index
on myinfo (city, rank, info_date);


InnoDB builds the primary table data store in a B-tree structure around "id", as it's the primary key. The index "index_lookup" contains index records for every record in the table, and the primary key of the record is stored as the "lookup key" for the index.

This may look OK at first glance, and will perform decently with up to a few million records. But consider how lookups on myinfo by a query like

select * from myinfo where city = 'San Jose' and rank between 5 and 10 and date > '2011-02-15';

are answered by MySQL:

1. First, the index B-tree is walked to find the records of interest in the index structure itself.
2. Now, for every record of interest, the entire "primary" B-tree is walked to fetch the actual record values.

This means that N+1 B-trees are walked for N result records.

Now consider the following change to the above table:

create table myinfo (city varchar(50),
rank float,
info_date timestamp,
id bigint,
primary key (city, rank, info_date, id)
) engine=innodb;

create index id_lookup on myinfo (id);

The primary key is now a four-column primary key, and since "id" is distinct, it satisfies the uniqueness requirements for primary keys. The above query now only has to walk a single B-tree to be completely answered. Note also that searches against CITY alone or CITY+RANK also benefit.

Let's plug in some numbers, and put 100M records into myinfo. Let's also say that an average search returns 5,000 records.

Schema 1: (Index lookup + Primary Key lookup from index):
Lg (100M) * 1 + 5000 * Lg (100M) = 132903 B-tree operations.

Schema 2: (Primary Key lookup only):
Lg(100M) * 1 = 26 B-tree operations. (Note that this single B-tree ingress operation will fetch 5K records)

So, for this query, Schema 2 is over 5,000 times faster than Schema 1. So, if Schema 2 is answered in a second, Schema 1 will take nearly two hours.

Note that we've played a bit of a trick here, and now lookups on "ID" are relatively expensive. But there are many situations where a table identifier is rarely or never looked up, but used as the primary key as "InnoDB needs a primary key".

See also Schema Design Matters

No comments: