Data Quality

Understanding `Optional` and `Nullable` properties

Effect of optional vs nullable properties on data quality

Atharva Inamdar

--

Many will know and understand the importance of having good quality data and the principles of garbage in, garbage out. I won’t go into that but will look at some of the examples where quality can be lost and what the semantics of enforcing constraints such as required/optional and nullable fields are.

Photo by Nadir sYzYgY on Unsplash

Required/Optional property

Let’s start with context and definitions. In the context of data schemas each field can be specified as required or optional. This means either the field is required to be present in data record or not (optional). Examples follow. Assume we have a field called id. I’m going to show this as JSON document for simplicity.

// id is required// valid data
{ "id": 1 }
// invalid data
{ }

As you can see, this property relates to the key or field within a data record.

Nullable/Non-Nullable property

In contrast, nullable property relates to the value of the field. null value has many meanings. It can mean one of:

  • value is not set/specified
  • value is missing (data loss)
  • null happens to be a default value used by the system if a column/field doesn’t exist
  • value is unknown

All these have different impact on how null value is interpreted.

Example of nullable and non-nullable

// id is nullable or int type// valid
{ "id": null }
{ "id": 1 }
// invalid
{ "id": "abc" }
{ }
// id is NOT nullable and int type
// valid

{ "id": 1 }
// invalid{ "id": null }
{ "id": "abc" }
{ }

Combination

Examples of valid combinations:

|      .       |    Required    |      Optional      |
|--------------|----------------|--------------------|
| Nullable | { "id": null } | { "id": null }, {} |
| Not Nullable | { "id": 1 } | { "id": 1 }, {} |

For the combination of Optional + Not Nullable, if the field is specified, the value cannot be null.

Required + Nullable

How should consumer interpret this? What does null mean in this case? In my opinion there is little value to make required field nullable. The only benefit would be for backwards compatibility of schemas whereby a business change has caused a required field to be transitioned from not-nullable to nullable.

Usually a business analyst will mean a required field needs to be not nullable and required.

Optional + Nullable

Similarly, what is the benefit of a null value in an optional field? The benefit is mainly for transitioning a required nullable field to optional nullable field.

Conclusion

While this may seem like an obvious description of these concepts, often the requirements specify a field to be mandatory or not. Hopefully, I have highlighted that this is only half the information and we need to specify nullable property for each field.

When designing new schemas, choose the appropriate one to fit your business meaning, not the one that allows most flexibility. When evolving schemas, you may have to choose suboptimal properties to maintain backwards compatibility with your existing schema and data.

--

--