db agnostic - varchar(255) vs. varchar(256)

Discussion:

(too old to reply)

greg.fenton

2007-01-22 22:13:21 UTC

We're having a pseudo-debate over the "correct" size to make fields in
our app. Our framework's persistence layer is mostly RDBMS-agnostic.

Some of us old-timers use varchar(255) [or that newfangled
nvarchar(255)] mostly for historic reasons.

The young-bloods tend to be using varchar(256) [though I even wonder
about the logic of that].

Assuming you are trying to come up with a "rule of thumb" for character
fields for values such as "description", what practice do you tend to
prescribe?

Thanks in advance,
greg.fenton

Thomas Kellerer

2007-01-22 22:22:17 UTC

Permalink

Post by greg.fenton
We're having a pseudo-debate over the "correct" size to make fields in
our app. Our framework's persistence layer is mostly RDBMS-agnostic.
Some of us old-timers use varchar(255) [or that newfangled
nvarchar(255)] mostly for historic reasons.
The young-bloods tend to be using varchar(256) [though I even wonder
about the logic of that].
Assuming you are trying to come up with a "rule of thumb" for character
fields for values such as "description", what practice do you tend to
prescribe?

Neither 255 nor 256!

I have never understood what would be that special in a DBMS about 255 or 256 (I
never worked with something like dBase, where there might have been some limits
because of using a single byte for the length)

255, 256, 274, 301, 545, 42. Where is the difference? The size should be driven
by business requirements, not by some obscure technical "reasons".

If I don't know any restrictions, I choose the max. size the DBMS supports for a
varchar field.
For any decent DBMS this simply shouldn't matter at all. Neither in speed nor in
space requirements.

Thomas

Gene Wirchenko

2007-01-23 02:17:36 UTC

Permalink

Post by Thomas Kellerer

The limit is 254. I think it has been been that since dBASE II.
OTOH, memos have been in xBASE for, what, 20 years? Memos are limited
to disk space.

Post by Thomas Kellerer
255, 256, 274, 301, 545, 42. Where is the difference? The size should be driven
by business requirements, not by some obscure technical "reasons".

Exactly.

Post by Thomas Kellerer
If I don't know any restrictions, I choose the max. size the DBMS supports for a
varchar field.
For any decent DBMS this simply shouldn't matter at all. Neither in speed nor in
space requirements.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

greg.fenton

2007-01-23 14:55:57 UTC

Permalink

Post by Thomas Kellerer
I have never understood what would be that special in a DBMS about 255 or 256 (I
never worked with something like dBase, where there might have been some limits
because of using a single byte for the length)

The business requirement is that we need a field to hold enough
information without allowing the user to write a book.

Post by Thomas Kellerer
255, 256, 274, 301, 545, 42. Where is the difference? The size should be driven
by business requirements, not by some obscure technical "reasons".

In the "old days", paging and block sizes most definitely played a role
in performance tuning. I strongly suspect the same to be true today,
though maybe only at the page-level. Physical design must take into
account the number of reads necessary to satisfy the most common
queries and blindly choosing overly large sizes for fields leads to
excessive reads.

Think of a table with 80M rows. What is the I/O penalty for choosing a
row definition that causes row-splitting to 30% of the data. Now, when
looking at physical design in a specific instance you have the ability
to look at the complete schema (and hopefully a full test environment)
to determine optimal definition. But when initially designing an
application that is to evolve overtime, one needs to start somewhere.

255 vs 256 is a discussion point because that is a common contraint
established by business requirements (enough space to describe but
we're don't want to write a book). Since our system is going to go
against many disparate RDBMSes, I was looking for insight as to known
limitations of some.

Thanks,
greg.fenton

Thomas Kellerer

2007-01-23 15:40:40 UTC

Permalink

Post by greg.fenton
The business requirement is that we need a field to hold enough
information without allowing the user to write a book.

That is not a business requirement in my opinion. They have to define
"enough" and "book". Try to get some sample date for "enough" and for
"book" then you might be able to find a real number (in characters).

Post by greg.fenton
Think of a table with 80M rows. What is the I/O penalty for choosing a
row definition that causes row-splitting to 30% of the data. Now, when
looking at physical design in a specific instance you have the ability
to look at the complete schema (and hopefully a full test environment)
to determine optimal definition. But when initially designing an
application that is to evolve overtime, one needs to start somewhere.

Row splitting should only occur if the max. size is really used. As long
as the actual length of the entered data is still small enough to fit
the whole record on a single page, it doesn't matter whether the column
was defined as VARCHAR(100) or VARCHAR(5000). Or at least that's what
I'd expect of a decent DBMS.

I would never let such things influence my design up-front. Especially
because you will need to investigate each and every DBMS you want to use
*very* closely to find out how it deals with this kind of things. For
some it might be easy to tune, for some it might be impossible.

If multi-DBMS is one of your concerns you'd have to take limitations of
the DBMS into account as well. In Oracle accessing a CLOB column is a
bit different from accessing a VARCHAR column (at least this is true for
the JDBC driver). This is not true for other DBMS. So your programming
language and DB access layer must be taken into account as well (and the
quirks of each driver for the DBMS that you plan to use).

Then you have a limit on the length of a literal String. Oracle imposes
a limit of 4000 characters. Anything beyound that must be stored in the
database using bind variables.
I think SQL Server has a limit on the total length of the SQL statement
so that would need special treatment as well if you are not using bind
variables.

The next thing is the encoding. Some DBMS allow you to define the length
of a VARCHAR in bytes other use character semantics (and then there are
some where you can choose between both). So you also have to take into
account that with e.g. UTF8 a VARCHAR(1000) column might actually be
able to hold about 300 characters if it was defined in byte.

Thomas