Post by greg.fentonThe business requirement is that we need a field to hold enough
information without allowing the user to write a book.
That is not a business requirement in my opinion. They have to define
"enough" and "book". Try to get some sample date for "enough" and for
"book" then you might be able to find a real number (in characters).
Post by greg.fentonThink of a table with 80M rows. What is the I/O penalty for choosing a
row definition that causes row-splitting to 30% of the data. Now, when
looking at physical design in a specific instance you have the ability
to look at the complete schema (and hopefully a full test environment)
to determine optimal definition. But when initially designing an
application that is to evolve overtime, one needs to start somewhere.
Row splitting should only occur if the max. size is really used. As long
as the actual length of the entered data is still small enough to fit
the whole record on a single page, it doesn't matter whether the column
was defined as VARCHAR(100) or VARCHAR(5000). Or at least that's what
I'd expect of a decent DBMS.
I would never let such things influence my design up-front. Especially
because you will need to investigate each and every DBMS you want to use
*very* closely to find out how it deals with this kind of things. For
some it might be easy to tune, for some it might be impossible.
If multi-DBMS is one of your concerns you'd have to take limitations of
the DBMS into account as well. In Oracle accessing a CLOB column is a
bit different from accessing a VARCHAR column (at least this is true for
the JDBC driver). This is not true for other DBMS. So your programming
language and DB access layer must be taken into account as well (and the
quirks of each driver for the DBMS that you plan to use).
Then you have a limit on the length of a literal String. Oracle imposes
a limit of 4000 characters. Anything beyound that must be stored in the
database using bind variables.
I think SQL Server has a limit on the total length of the SQL statement
so that would need special treatment as well if you are not using bind
variables.
The next thing is the encoding. Some DBMS allow you to define the length
of a VARCHAR in bytes other use character semantics (and then there are
some where you can choose between both). So you also have to take into
account that with e.g. UTF8 a VARCHAR(1000) column might actually be
able to hold about 300 characters if it was defined in byte.
Thomas