Monday, May 23, 2011

Loud heroes versus boring guys

One thing that I've always found interesting about the software and IT world is the focus on drama and heroics over "getting it right". There is an odd tendency to praise and value "heroes" who put out frequent fires over setting up processes that minimize the number of fires in the first place.

Having often been such a "hero", it definitely feels good to ride in on the white charger and save the day every so often, but even with good practices, you'll have plenty of opportunity for software heroics, particularly in the startup environments I typically inhabit. Changing requirements, new customers, changing business strategies, and other external forces will create plenty of drama and chances for heroism without the engineering team creating its own.

Often, boring is preferable to loud, even if the latter is more fun.

The problem with columnar databases

Columnar databases are an interesting idea. They're databases where the major data storage structure is a column, not a row (as is the case in the vast majority of relational database engines). A "row" in a "table" in a columnar database is materialized from the columns using join-like associations.

Columnar databases offer lots of interesting storage possibilities, such as native bitmap storage, that aren't easily doable with row-major storage, in which rows are stored contiguously on storage pages.

The problem for schema design with columnar databases is that you often want to qualify the column with a couple of other data elements from the table, such as a date.

Another, more practical problem is that few people know enough about columnar databases to design schemas that leverage their capabilities well.

Friday, May 20, 2011

Some simple schema design mistakes

1. Not paying attention when determining the primary key, particularly if using index-organized primary storage structures like InnoDB or Oracle's "ORGANIZATION INDEX" tables.

2. Thinking that defining an index on multiple columns means that each column is "indexed". Multi-column indexes give you a search structure that is organized from the leftmost column to the rightmost column in the CREATE INDEX (or KEY()) statement, so if the leftmost column(s) aren't used in a given query, columns in the "middle" can't be used.