Summarizing Data Using CUBE - T-SQL

Posted on 3 971 views

There are countless ways to SUM some data in SQL Server database. You can do it by using built-in aggregation functions like SUM and by using GROUP BY. You can do it also by using some subqueries, temp tables and doing some ordering to place your sum or sum's after the right row.

In this article I will show you how to use CUBE operator.

What is CUBE?

"The CUBE operator is specified in the GROUP BY clause of a SELECT statement. The select list contains the dimension columns and aggregate function expressions. The GROUP BY specifies the dimension columns and the keywords WITH CUBE. The result set contains all possible combinations of the values in the dimension columns, together with the aggregate values from the underlying rows that match that combination of dimension values."
(For more information please visit: http://msdn.microsoft.com/en-US/library/ms175939(v=sql.90).aspx)

In simpler words, CUBE is sort of extension of GROUP BY operator.
It works with columns as a dimmensions:

  • one column is SINGLE dimmension CUBE
  • two or more columns is MULTI dimmension CUBE

How to use CUBE?

In our database, we have some data about car model representation by cities. These numbers are here just to simplify the summarization. They don't represent actual number or number in millions. :D

ID   Car            City            Number
---- -------------- --------------- ---------
1    Honda          London          9
2    Toyota         Barcelona       15
3    Mazda          Berlin          7
4    Honda          Barcelona       8
5    Honda          Berlin          5
6    Mazda          London          11
7    Toyota         Berlin          6

    

So we have some cars models like Honda, Toyota and Mazda in cities like London, Barcelona and Berlin. Now, we shall use CUBE operator, and see what will get.

Single dimmension CUBE

The following SQL query...

    SELECT	Car, 
SUM(Number) as Number
FROM dbo.Data
GROUP BY Car WITH CUBE
    

... will give us the following output data:

Car        Number
---------- -----------
Honda      22
Mazda      18
Toyota     21
NULL       61
    

CONCLUSION: well, it's just like the normal grouping, except here we get one final row that represents the total sum of all car models.

But, what is the problem in this data?? The problem is NULL. To fix this, there is one cool function named GROUPING.

The GROUPING function returns 0 if the column value came from the fact data, and 1 if the column value is a NULL generated by the CUBE operation. In a CUBE operation, a generated NULL represents all values.

Use of the GROUPING function is the following:

    SELECT	(CASE WHEN (GROUPING(Car) = 1) THEN 'All cars' ELSE ISNULL(Car, '-') END) AS Car,
		SUM(Number) as Number
FROM dbo.Data
GROUP BY Car WITH CUBE
    

Output data is:

    Car        Number
---------- -----------
Honda      22
Mazda      18
Toyota     21
All cars   61
    

This GROUPING function made distinct from normal grouped row and CUBE generated row where it previous said NULL. All we did here was replacemant of NULL value with 'All cars' text.
And, that's it. This is the most simple example of using GROUPING function.

MULTI dimmension CUBE

As we said, multidimmensional CUBE is using CUBE with multiple columns.

The most simple example is the following:

    SELECT Car, City, SUM(Number) as Number
FROM dbo.Data
GROUP BY City, Car WITH CUBE
    

Note, that I list column "Car" first, but putting "Car" last when using with GROUP BY and CUBE operators. That tells SQL to group data by City and Car, but Car with CUBE!

That will give us the following result:

Car        City           Number
---------- -------------- ---------
Honda      Barcelona      8
Honda      Berlin         5
Honda      London         9
Honda      NULL           22

Mazda      Berlin         7
Mazda      London         11
Mazda      NULL           18

Toyota     Barcelona      15
Toyota     Berlin         6
Toyota     NULL           21

NULL       NULL           61

NULL       Barcelona      23
NULL       Berlin         18
NULL       London         20
    

Pretty cool, huh? :)

With one simple using of CUBE, we got the following:

  • total sum BY CARS (where only city is NULL)
  • total sum of ALL CARS (where car and city are NULL)
  • total sum of all cars BY CITIES (where only car is NULL)

You can also try to play by substitute order of "Car" and "City" in GROUP BY operator by putting "Car" first and "City WITH CUBE" last. Also, place the "City" column in SELECT operation first to get better visualization of grouped data. ;)