Set Theory for SQL Joins
Venn diagrams and set theory
SQL has three main set operations, UNION, INTERSECT, and EXCEPT.
- The Venn diagrams are shown to visualize the differences between them. We can think of each circle as representing a table.
- The green parts represent what is included after the set operation is performed on each pair of tables.
UNION
In SQL, the UNION operator takes two tables as input and returns all records from both tables.
The diagram shows two tables: left and right, and performing a UNION returns all records in each table.
If two records are identical, UNION only returns them once.
To illustrate this, the first two records of the right table have been faded out. Our result has seven records.
UNION ALL diagram
- In SQL, there is a further operator for unions called UNION ALL. In contrast to UNION, given the same two tables, UNION ALL will include duplicate records.
- Therefore, performing UNION ALL on this data will return nine records, whereas UNION only returned seven.
No records have been faded out.
UNION and UNION ALL syntax
The syntax for performing all the set operations is highly similar.
We perform a SELECT statement on our first table, and a SELECT statement on our second table, and specify a set operation (in this example, either UNION or UNION ALL) between them.
Note
Set operations do not require a field to join ON. This is because they do not do quite the same thing as join operations. Rather than comparing and merging tables on the left and right, they stack fields on top of one another.
For all set operations, the number of selected columns and their respective data types must be identical. For instance, we can’t stack a number field on top of a character field.
The result will only use field names (or aliases, if used) of the first SELECT statement in the query.
INTERSECT
INTERSECT takes two tables as input and returns only the records that exist in both tables.
In the diagram shown, we have two tables, left_table and right_table. The result of performing INTERSECT on these tables is only the records common to both tables: the first record. All records that are not of interest to the INTERSECT operation are faded out.
INTERSECT syntax
- The syntax for this set operation is very similar to that of UNION and UNION ALL.
- We perform a SELECT statement on our first table, and a SELECT statement on our second table, and specify our set operator between them.
INTERSECT vs. INNER JOIN on two columns
INTERSECT requires all fields to match, since in set operations we do not specify any fields to match.
This is also why it requires the left and right tables to have the same number of columns in order to compare records. Only records where both columns match are returned.
In INNER JOIN, similar to INTERSECT, only results where both fields match are returned.
👉INNER JOIN will return duplicate values, whereas INTERSECT will only return common records once. 🙉
👉INNER JOIN will add more columns to the result set.
EXCEPT
EXCEPT allows us to identify the records that are present in one table, but not the other. More specifically, it retains only records from the left table that are not present in the right table.
EXCEPT diagram
The diagram shown illustrates the workings of the EXCEPT operation.
All records that are not of interest to the EXCEPT operation are faded out.
Only the last two records of the left_table are returned.
EXCEPT syntax