READ N TRY

Roman's website about chip carving, whittling, and woodturning

SQL trick: UNION between two SELECT statements that don't share the same number of columns and their types

SQL Trick: UNION Any SELECTs

Many applications use SQL to save and retrieve data from relational databases. Often an application needs to retrieve a few sets of data from its database before it could continue processing. Usually, applications send several SELECT statements to their database to retrieve needed data – one SELECT statement per one data set. In this post, I will show one SQL trick that allows retrieving two or more unrelated sets of data from the database by using only one SELECT statement.

Getting unrelated data as one query result set is useful if you deal with special latency or functional requirements. For example, in SSRS (reporting system) each graph, pie chart, table, or matrix is tied to one data source that represents one SELECT statement, so this SQL trick comes in handy to show a summary of unrelated data together in one table.

The idea of this SQL trick is to use UNION operator to combine unrelated data. The combined dataset may be grouped further by a common column(s) if any. By adding NULLs we can work around the UNION requirement that all SELECT statements must have the same number of columns with similar types. The trick works with Oracle, Microsoft SQL Server, MySQL, PostgreSQL, and other relational databases that support SQL-92 standard.

As an example, I included two separate SELECT statements. One shows a number of hired employees grouped by month. The second query returns a number of orders and SubTotal per month. These two queries return unrelated data sets that cannot be retrieved by joining database tables. I used the AdventureWorks database and Microsoft SQL Server to prepare this example.

SELECT Year(hireDate) as Y, MONTH(hireDate) as M
  , COUNT(*) as Hired
FROM HumanResources.Employee
GROUP BY Year(hireDate), MONTH(hireDate)
ORDER BY 1, 2;
SELECT Year(OrderDate) as Y, MONTH(OrderDate) as M
  , COUNT(*) as Orders, SUM(SubTotal) as OrderSubTotal
FROM Sales.SalesOrderHeader
GROUP BY Year(OrderDate), MONTH(OrderDate)
ORDER BY 1, 2;
sql-trick-2013-april-data1

To combine these two queries let’s add the UNION operator. We also need to adjust the queries by adding “fake” or “missing” columns to work around the UNION requirement. We will add the “Orders” and “OrderSubTotal” columns to the first query and the “Hired” column to the second query, so both queries will have the same number of columns. When we run the updated query we get both data sets as one query result.

SELECT Year(hireDate) as Y, MONTH(hireDate) as M
  , COUNT(*) as Hired
  , NULL as Orders, NULL as OrderSubTotal -- fake columns
FROM HumanResources.Employee
GROUP BY Year(hireDate), MONTH(hireDate)

UNION ALL

SELECT Year(OrderDate) as Y, MONTH(OrderDate) as M
  , NULL –- fake the Hired column
  , COUNT(*), SUM(SubTotal) -- Orders, OrderSubTotal
FROM Sales.SalesOrderHeader
GROUP BY Year(OrderDate), MONTH(OrderDate)
ORDER BY 1, 2;
sql-trick-2013-april-data2

We can see that the query sometimes returns two rows for the same month, but it is easy to process the retrieved results in application logic. By analyzing NULL values it is easy to see if a row represents a number of hired employees or a number of orders.

Sometimes we can aggregate the data further by injecting the combined query as a sub-query and grouping the data by common columns in the outer SELECT statement. Here is an example where the records are grouped by year and month. We can group the results by those columns because they appear in both queries:

SELECT Y, M, SUM(Hired) as Hired
  , SUM(Orders) as Orders, SUM(OrderSubTotal) as OrderSubTotal
FROM (
  SELECT Year(hireDate) as Y, MONTH(hireDate) as M
    , COUNT(*) as Hired
    , NULL as Orders, NULL as OrderSubTotal -- fake columns
  FROM HumanResources.Employee
  GROUP BY Year(hireDate), MONTH(hireDate)

  UNION ALL

  SELECT Year(OrderDate) as Y, MONTH(OrderDate) as M
    , NULL -- Hired column
    , COUNT(*), SUM(SubTotal) -- Orders, OrderSubTotal
  FROM Sales.SalesOrderHeader
  GROUP BY Year(OrderDate), MONTH(OrderDate)
) summary
GROUP BY Y, M
ORDER BY 1, 2;
sql-trick-2013-april-data3

Notice that the aggregated query contains only one row for each month when the previous SQL sometimes returns two rows for the same month. This means that less data is transferred between an application and its database, but it also means that the database needs to do more work. If the database performance is a concern avoid unnecessary grouping and sorting in SQL. In most cases, these operations can be done on the application logic side.

If you need an aggregated result set consider using 0 instead of NULL while “faking” the columns. Sometimes this allows showing zero instead of null without breaking aggregation logic; the example above would show 0 hired employees in April, June, and August of 2002.

I hope reading about this trick was useful. Now you know how to combine a few unrelated SELECT statements into one. Try to see if the described SQL trick can help you to meet your specific requirements.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *