Saturday, August 25, 2007

Merits of Defining your DB Schema in your Application?

Late last year I developed and released a Ruby plugin called DrySQL that aims to eliminate any traces of DB schema re-definition in your application. The philosophy behind this effort is that the DB is indeed the place that table structures, constraints, and relationships should be defined, and that it is redundant to re-define any of these artifacts in an application or anywhere else.

While my colleagues in the enterprise space generally shared my philosophy, I have heard some suggest that DrySQL's approach "sounded backwards" and I know of at least one project out there that aims to do exactly the opposite of what DrySQL does: it aims to support defining the DB schema entirely on the application side rather than on the DB.

I don't yet understand the attraction of this approach, but I'll be interested to hear the opinions of those who support it. I have difficulty grasping the logic behind ignoring established best practices, no matter how small your business is, but I suppose that just about anything could make sense given the right (or wrong) set of requirements and restrictions.

In considering my case for the DB schema living on the DB and nowhere else, my goal is to satisfy 2 main concerns.

1) Controlling The Cost of Change

To maximally reduce the cost of change in an environment consisting of one or more applications interacting with one or more databases, each database schema should be defined in one place only. If the schema is defined in more than one place, schema changes become more difficult (and thus more unattractive) to perform, as they must be cascaded across multiple definitions. If synchronizing multiple definitions requires human intervention, this approach is obviously prone to human error, not to mention tedium and annoyance. This situation is described as "inviting a maintenance nightmare" in The Pragmatic Programmer. Minimizing barriers to change is important to me because I want my software infrastructure to support evolution, and even encourage it.

2) Maintaining Data Integrity

Given that the DB schema is to live in one place only, what place can guarantee the integrity of your data? Only the database itself. There are no back doors through the constraints defined on your database (except perhaps a bug in your DBMS if you are incredibly unlucky). Constraints defined elsewhere cannot be guaranteed to be enforced before data is written to your DB. Even in an environment where only one application interacts with your DB, it would be naive to think that your data would be safe without constraints on the DB. All it takes is one developer or tester (maybe even you?) to take a shortcut, execute a few "harmless" SQL queries through your database's command line interface (or any other interface for that matter), and your data is corrupted. The worst part of all is that the corruption might not be obvious immediately. Sure, an adequate backup schedule and a DB with some decent rollback/rollforward functionality will let you re-stabilize your data, but why pass up a frontline data protection mechanism that is so effective and accessible?

If you have an opinion on this subject -- particularly if it differs from mine -- please share it with me.