Wednesday, August 4, 2010

Doing star schemas in MySQL...

One big problem in MySQL is that it can't do a native "star transformation" query. In practice, this means that only one index can be used for a given table in a search.

So, to do a star transformation, you have to jump through some hoops. These hoops may look silly, but they do work well, particularly for large tables (ie, more than 100M recs or so).

The problem is if you have a table that is to be searched in a relatively ad-hoc fashion, particularly if searched columns are ANDed together.

Let's say you have a table like

(id, searchcol1, searchcol2, searchcol3, ... searchcolN, datacol1, datacol2, ... datacolM) primary key (id);

A typical approach would be to create indexes on all the search cols and enumerate searches in the whereclause. The MySQL query optimizer will pick an index for one of them and do simple qualifications on the rest of the searches. This means that you have to hope that your most selective search column gets picked, and that it doesn't visit many rows or your search performance will suck.

An alternate approach is to use "helper tables". These tables are standalone tables with (ID, searchcol[i], ), and use carefully constructed multi-value primary keys with searchcol as the first key column, your other standard searches as the other key components, and the ID as the last key component to guarantee uniqueness.

With helper tables, you do your individual searches into TEMPORARY tables containing only IDs, use "SQL INTERSECT" to implement AND logic and "SQL UNION ALL" to implement OR logic, and put the final result set into a final TEMPORARY table. After this, you can do the lookup on the big table to fetch the final answer.

In practice, this means you're using all indexed searches to "vector in" to the final result set. We've been able to get good "sort of" ad-hoc search performance with up to 500M records with the above approach.