Done a fair bit of BI and OLAP this last decade, and I still feel NULL cause more issues than it solves.
For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.
Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls
Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.
Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.
This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.
Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.
For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.
Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls
Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.
Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.
This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.
Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.