Summarizing Data Using CUBE - T-SQL
There are countless ways to SUM some data in SQL Server database.
You can do it by using built-in aggregation functions like SUM and by using GROUP BY.
You can do it also by using some subqueries, temp tables and doing some ordering to place your sum or sum's after the right row.
In this article I will show you how to use CUBE operator.
What is CUBE?
"The CUBE operator is specified in the GROUP BY clause of a SELECT statement.
The select list contains the dimension columns and aggregate function expressions.
The GROUP BY specifies the dimension columns and the keywords WITH CUBE.
The result set contains all possible combinations of the values in the dimension columns,
together with the aggregate values from the underlying rows that match that combination of dimension values."
(For more information please visit: http://msdn.microsoft.com/en-US/library/ms175939(v=sql.90).aspx)
In simpler words, CUBE is sort of extension of GROUP BY operator.
It works with columns as a dimmensions:
- one column is SINGLE dimmension CUBE
- two or more columns is MULTI dimmension CUBE
How to use CUBE?
In our database, we have some data about car model representation by cities. These numbers are here just to simplify the summarization. They don't represent actual number or number in millions. :D
ID Car City Number
---- -------------- --------------- ---------
1 Honda London 9
2 Toyota Barcelona 15
3 Mazda Berlin 7
4 Honda Barcelona 8
5 Honda Berlin 5
6 Mazda London 11
7 Toyota Berlin 6
So we have some cars models like Honda, Toyota and Mazda in cities like London, Barcelona and Berlin. Now, we shall use CUBE operator, and see what will get.
Single dimmension CUBE
The following SQL query...
SELECT Car,
SUM(Number) as Number
FROM dbo.Data
GROUP BY Car WITH CUBE
... will give us the following output data:
Car Number
---------- -----------
Honda 22
Mazda 18
Toyota 21
NULL 61
CONCLUSION: well, it's just like the normal grouping, except here we get one final row that represents the total sum of all car models.
But, what is the problem in this data?? The problem is NULL. To fix this, there is one cool function named GROUPING.
The GROUPING function returns 0 if the column value came from the fact data, and 1 if the column value is a NULL generated by the CUBE operation.
In a CUBE operation, a generated NULL represents all values.
Use of the GROUPING function is the following:
SELECT (CASE WHEN (GROUPING(Car) = 1) THEN 'All cars' ELSE ISNULL(Car, '-') END) AS Car,
SUM(Number) as Number
FROM dbo.Data
GROUP BY Car WITH CUBE
Output data is:
Car Number
---------- -----------
Honda 22
Mazda 18
Toyota 21
All cars 61
This GROUPING function made distinct from normal grouped row and CUBE generated row where it previous said NULL. All we did here was replacemant of NULL value with 'All cars' text.
And, that's it. This is the most simple example of using GROUPING function.
MULTI dimmension CUBE
As we said, multidimmensional CUBE is using CUBE with multiple columns.
The most simple example is the following:
SELECT Car, City, SUM(Number) as Number
FROM dbo.Data
GROUP BY City, Car WITH CUBE
Note, that I list column "Car" first, but putting "Car" last when using with GROUP BY and CUBE operators. That tells SQL to group data by City and Car, but Car with CUBE!
That will give us the following result:
Car City Number
---------- -------------- ---------
Honda Barcelona 8
Honda Berlin 5
Honda London 9
Honda NULL 22
Mazda Berlin 7
Mazda London 11
Mazda NULL 18
Toyota Barcelona 15
Toyota Berlin 6
Toyota NULL 21
NULL NULL 61
NULL Barcelona 23
NULL Berlin 18
NULL London 20
Pretty cool, huh? :)
With one simple using of CUBE, we got the following:
- total sum BY CARS (where only city is NULL)
- total sum of ALL CARS (where car and city are NULL)
- total sum of all cars BY CITIES (where only car is NULL)
You can also try to play by substitute order of "Car" and "City" in GROUP BY operator by putting "Car" first and "City WITH CUBE" last. Also, place the "City" column in SELECT operation first to get better visualization of grouped data. ;)