UNION statement in SQL

UNION is a statement or operator that allows you to join 2 or more SELECT by concatenating the results of execution into a single result set. Unlike the JOIN clause, which adds the specific columns of each table to the result set, UNION does not create individual records from the columns of the tables participating in the join, but rather concatenates the set of their results.

This operation is subject to strict rules that cannot be violated:

  • The SELECT that participate in the UNION must have the same number of columns.

  • The columns have to be paired in order, being of the same data type or otherwise, compatible data, for example: FLOAT and INT

The above rules show that a lot of times, UNION is used to concatenate mirror tables or tables that are very similar in their definition, that although they store different information they share a similar data model.

We have to take into account that the name of the columns of the result set will be defined by the columns of the first SELECT statement, so sometimes it may be advisable to put an alias to some of the columns if the data we store is substantially different from the information collected from one table to another.

Also, note that UNION excludes duplicate values. If we want our result set to include duplicates, we need to replace the UNION with UNION ALL.

As an example we are going to create first a new table named Employees table that fits the model of the Clients table, to later perform a UNION of both.

In order for the user to replicate these same examples in his database, the CREATE TABLE scripts and the INSERT INTO of the new table data would be:

CREATE TABLE Employees(

    employeeId       INT IDENTITY(1,1),
    [name]           VARCHAR(255) NOT NULL ,
    [lastName]       VARCHAR(255) NOT NULL ,
    [address]        VARCHAR(255),
    employeeNumber   INT

    CONSTRAINT pk_Employee_employeeId PRIMARY KEY (employeeId)
);
INSERT INTO Employees
    ([name],
     [lastName],
     [address],
     employeeNumber)
VALUES
    ('Levi',
     'Jones',
     '6101 KLUMP AVE',
     1001);
INSERT INTO Employees
    ([name],
     [lastName],
     [address],
     employeeNumber)
VALUES
    ('Daniel',
     'White',
     '7939 BRIGHT AVE',
     1002);

Once created and the data entered in the new Employees table, we proceed to exemplify the use of the UNION operator:

SELECT [name],
       [lastName],
       [address],
       bankAccount as ClientAccountEmployeeNumber
FROM Customers
WHERE bankAccount IS NOT NULL
UNION
SELECT [name],
       [lastName],
       [address], employeeNumber
FROM Employees
ORDER BY ClientAccountEmployeeNumber ASC;
name lastName address ClientAccountEmployeeNumber
Levi Jones 6101 KLUMP AVE 1001
Daniel White 7939 BRIGHT AVE 1002
Janice Haynes 4822 Radio Park Drive 111222333
Mavin Pettitt 2336 Cottonwood Lane 123456789
Peter Davis 3608 Sycamore Lake Road 998344567
Helen Ward 711 Hershell Hollow Road 447824556
Kimberly Lee 4298 Drummond Street 778345112
Jessie Good 4452 Ventura Drive 790763467
Paul Williams 2696 Ocello Street 998344567

We can see that in a single result set we are returning the records from the Customers and Employees tables.

As we indicated before, since it is not a JOIN, we cannot have independent columns to store the values of the bankAccount field of the Customers table, and the employeeNumber field of the Employee table, so all the values have to be stored in the same column. Failing that, we have created an alias: ClientAccountEmployeeNumber, to specify that said column can store both values, and given its nature, differentiate which values belong to a customer and which to an employee.

There are other set operators such as EXCEPT and INTERSECT that we will explain below using the Customers table as an example:

INTERSECT

INTERSECT returns only the records that are common to the union of all the queries:

SELECT customerId,
        [name],
        [lastName],
        [address],
        bankAccount
FROM Customers
WHERE customerId IN (1,2,3,4)
INTERSECT
SELECT customerId,
      [name],
      [lastName],
      [address],
      bankAccount
FROM Customers
WHERE customerId IN (4,5,6,7);
customerId name lastName address bankAccount
4 Helen Ward 711 Hershell Hollow Road 447824556

In this case, we can see that only the record with customerId = 4, is returned by both queries.

EXCEPT

EXCEPT returns only the unique records of the first query, those that do not match the rest:

SELECT customerId ,
       [name],
       [lastName],
       [address],
       bankAccount
FROM Customers
WHERE customerId IN (1,2,3,4)
EXCEPT
SELECT customerId,
       [name],
       [lastName],
       [address],
       bankAccount
FROM Customers
WHERE customerId IN (4,5,6,7);
customerId name lastName address bankAccount
1 Janice Haynes 4822 Radio Park Drive 111222333
2 Mavin Pettitt 2336 Cottonwood Lane 123456789
3 Peter Davis 3608 Sycamore Lake Road 998344567

In this case, only the idCustomers 1,2,3, are returned exclusively by the first query of the union.