Sunday, August 28, 2011

Development in a large-data environment

One of the biggest problems with application development in the context of "Big Data" is that the developer gets the database-interacting code to "work" in the developer's "playpen" database, but the code collapses when it's put into production. A related, but even more serious problem - since it won't be found as quickly - is code that works early, but has very bad degradation as the production database grows.

There's a couple approaches to this problem:

1. Have the typical "small database" for initial coding and debugging.
2. Have a large developer playpen database. It should be a large fraction of the size of the production database, or if the production database is small, it should contain contrived (but logically consistent) data that is a large fraction of the expected size of the production database.

Developers should unit-test against both databases before committing their changes.

No comments: