Tutorial :Is an overuse of nullable columns in a database a “code smell”?



Question:

I'm just stepping into a project and it has a fairly large database backend. I've started digging through this database and 95% of the fields are nullable.

Is this normal practice in the database world? I'm just a lowly programmer, not a DBA but I would think you would want to keep nullable fields to a minimum, only where they make sense.

Is it a "code smell" if most columns are nullable?


Solution:1

Default values are typically the exception and NULLs are the norm, in my experience.

True, nulls are annoying.

It's also extremely useful because null is the best indicator of "NO VALUE". A concrete default value is very misleading, and you can lose information or introduce confusion down the road.


Solution:2

Anyone who has developed a data entry application knows how common it is for some of the fields to be unknown at the time of entry -- even for columns that are business-critical, to address @Chris McCall's answer.

However, a "code smell" is merely an indicator that something might be coded in a sloppy way. You use smells to identify things that need more investigation, not necessarily things that must be changed.

So yes, if you see nullable columns so consistently, you're right to be suspicious. It might indicate that someone was being lazy, or afraid to declare NOT NULL columns unequivocally. You can justify doing your own analysis.


Solution:3

I'm of the Extreme NO camp: I avoid NULLs all the time. Putting aside fundamental considerations about what they actually mean (because talk to different people, you'll get different answers such as "no value", "unknown value", "missing", "my ginger cat called Null"), the worst problem NULLs cause is that they often ruin your queries in mysterious ways.

I've lost count of the number of times I've had to debug someone's query (okay, maybe 9) and traced the problem to a join against a NULL. If your code needs ISNULL to repair joins then the chances are you've also lost index applicability and performance with it.

If you do have to store a "missing/unknown/null/cat" value (and it's something I prefer to avoid), it is better to be explicit about it.

Those skilled at NULLs may disagree. NULL use tends to split SQL crowds down the middle.

In my experience, heavy NULL use has been positively correlated with database abuse but I wouldn't carve this into stone tablets as some Law of Nature. My experience is just my experience.

EDIT: Additional thought. It is possible that those who are anti-null racists like myself are more excited by normalization than those who are pro-NULL. I don't think rabid normalizers would be too happy with ragged edges on their tables that can take NULLs. Lots of nulls may indicate that the the database developers are not into heavy normalisation. So rather than NULL suggesting code is "bad" it may alternatively suggest the philosophical position of the developers on normalisation. Maybe this is reaching. Just a thought.


Solution:4

Don't know if I consider it always a bad thing, but if the columns are being added because a single record (or maybe a few) need to have values while most don't, then it indicates a pretty flat table structure. If you're seeing column names like "addr1", "addr2", "addr3", then it stinks!

I would bet that most of the columns you have could be removed and represented in other tables. You could find the "non-null" ones through a foreign key relationship. This will increase the joins that you'll be doing, but it could be more preformant that doing a "where not col1 is null".


Solution:5

I think nullable columns should be avoided. Wherever the semantics of the domain make it possible to use a value that clearly indicates missing data, it should be used instead of NULL.

For instance, let's imagine a table that contains a Comment field. Most developers would place a NULL here to indicate that there's no data in the column. (And, hopefully, a check constraint that disallows zero-length strings so that we have a well-known "value" to indicate the lack of a value.) My approach is usually the opposite. The Comment column is NOT NULL and a zero-length string indicates the lack of a value. (I use a check constraint to ensure that the zero-length string is really a zero-length string, and not whitespace.)

So, why would I do this? Two reasons:

  1. NULLs require special logic in SQL, and this technique avoids that.
  2. Many client-side libraries have special values to indicate NULL. For instance, if you use Microsoft's ADO.NET, the constant DBNull.Value indicates a NULL, and you have to test for that. Using a zero-length string on a NOT NULL column obviates the need.

Despite all of this, there are many circumstances in which NULLs are fine. In fact, I have no objection to their use in the scenario above, although it wouldn't be my preferred way.

Whatever you do, be kind to those who will use your tables. Be consistent. Allow them to SELECT with confidence. Let me explain what I mean by this. I recently worked on a project whose database was not designed by me. Nearly every column was nullable and had no constraints. There was no consistency about what represented the absence of a value. It could be NULL, a zero-length string, or even a bunch of spaces, and often was. (How that soup of values got there, I don't know.)

Imagine the ugly code a developer has to write to find all of those records with a missing Comment field in this scenario:

SELECT * FROM Foo WHERE LEN(ISNULL(Comment, '')) = 0  

Amazingly there are developers who regard this as perfectly acceptable, even normal, despite possible performance implications. Better would be:

SELECT * FROM Foo WHERE Comment IS NULL  

Or

SELECT * FROM Foo WHERE Comment = ''  

If your table is properly designed, the above two SQL statements can be relied upon to produce quality data.


Solution:6

In short, I would say yes, this is probably a code smell.

Whether a column is nullable or not is very important and should be determined carefully. The question should be assessed for every column. I am not a believer in a single "best practices" default for NULL. The "best practice" for me is to address the nullability thoroughly during the design and/or refactoring of the table.

To start with, none of your primary key columns are going to be nullable. Then, I strongly lean towards NOT NULL for anything which is a foreign key.

Some other things I consider:

Criteria where NULL should be strongly avoided: money columns - is there really a possibility that this amount will be unknown?

Criteria where NULL can be justified most frequently: datetime columns - there are no reserved dates, so NULL is effectively your best option

Other data types: char/varchar columns - for codes/identifiers - NOT NULL almost exclusively int columns - mostly NOT NULL unless it's something like "number of children" where you want to distinguish an unknown response.


Solution:7

No, whether or not a field should be nullable is a data concept and can't be a code smell. Whether or not NULLs are annoying to code has nothing to do with the usefulness of having nullable data fields.


Solution:8

They are a (very common) smell, I'm afraid. Look up C.J. Date writings on the topic.


Solution:9

As a best practice, if a column shouldn't be nullable, then it should be marked as such. However, I don't believe in going completely insane with things like this.


Solution:10

I think so. If you don't need the data, then it's not important to your business. If it is important to your business, it should be required.


Solution:11

This is all completely dependent on the scope and requirements of the project. I wouldn't use number of nullable fields alone as a metric for poorly written or designed code. Have a look at the business domain, if there are many non nullable fields represented there that are nullable in the database, then you have some issues.


Solution:12

In my experience, it is a problem when Null and Not Null don't match up to the required field /not required field.

It is in the realm of possibility that those really are all optional fields. If you find in the business tier or the UI tier that those fields are required, then I think this means the data model has drifted away from the business object model and is a sign of overly conservative DB change policies, or oversight.

If you run a sample data generator on your data, and then try to load the data that is valid according to SQL, you would find out right away if the rules match up.


Solution:13

That seems like a lot, it probably means you should at least investigate. Note that if this is mature product with a lot of data, convincing anyone to change the structure may be difficult. The earlier in the design phase you catch something like this the easier it is to fix up all the related code to adjust for the change.

Whether it is bad that they used the nulls would depend on whether the columns allowing nulls look as if they should be related tables (home phone, cell phone, business phone etc which should be in aspearate phone table) or if they look like things that might not be applicable to all records (possibly could bea related table with a one-to-one relationship)or might not be known at the time of data entry (probably ok). I would also check to see if they in fact alwAys do have a value (then you might be able to change to not null if the information is genuinely required by the busniess logic). If you have a few records with null


Solution:14

In my experience, a lot nullable field in a large database like you have is very normal. Considering it perhaps is used by a lot of applications written by different people. Making columns nullable is annoying but it is perhaps the best way to keep the application robust.


Solution:15

One of the many ways to map inheritance (e.g. c# objects) to a database is to create a table for the class at the top of the hierarchy, then add the columns for all the other classes. The columns have to be nullable for when an object of a different subclass is stored in the database. This is called Single-table inheritance mapping (or Map Hierarchy To A Single Table) and is a standard design pattern.

A side effect of Single-table inheritance mapping is that most columns are nullable.


Also in Oracle an empty string (0 length) is considered to be null, therefore in some companies all strings columns are made nullable even on SqlServer. (just because the first customer wants the software on SqlServer does not mean the 2nd customer does not have a Oracle DBA that will not let SqlServer onto there network)


Solution:16

To throw the opposite opinion out there. Every single field in a database should nullable. There is nothing more frustrating than working with a database that on every single insert throws an exception about required this or required that. Nothing should be required.

There is one exception to that, keys. Obviously all primary and foreign keys should be enforced to exist.

It should be the application's job to validate data and the database to simply store and retrieve what you give it. Having it process validation logic even as simple as null or not null makes a project way more complex to maintain for having different rules spread over everything.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »