JOIN
JOIN
lets you combine two or more table expressions into a single table
expression.
Conceptual framework
Much like an RDBMS, Materialize can join together any two table expressions (in our case, either sources or views) into a single table expression.
Materialize has much broader support for JOIN
than most streaming platforms,
i.e. we support all types of SQL joins in all of the conditions you would
expect.
Syntax
join_expr
join_type
table_ref
Field | Use |
---|---|
select_pred | The predicating SELECT clauses you want to use, e.g. SELECT col_ref FROM table_ref... . The table_ref from the select_pred is the left-hand table. |
NATURAL | Join table expressions on all columns with the same names in both tables. This is similar to the USING clause naming all identically named columns in both tables. |
LATERAL | Let the following subquery or table function call refer to columns from join’s left-hand side. See LATERAL subqueries below. |
join_type | The type of JOIN you want to use (INNER is implied default). |
select_stmt | A SELECT statement. |
table_ref | The table expression you want to join, i.e. the right-hand table. |
table_func_call | A call to a table function. |
USING ( col_ref… ) | If the join condition does not require table-level qualification (i.e. joining tables on columns with the same name), the columns to join the tables on. For example, USING (customer_id) . |
ON expression | The condition on which to join the tables. For example ON purchase.customer_id = customer.id . |
select_pred | The remaining SELECT clauses you want to use, e.g. ...WHERE expr GROUP BY col_ref HAVING expr . |
Note: It’s possible to join together table expressions as inner joins without using this clause whatsoever, e.g. SELECT cols... FROM t1, t2 WHERE t1.x = t2.x GROUP BY cols...
Details
Unlike most other streaming platforms, JOIN
s in Materialize have very few, if
any, restrictions. For example, Materialize:
- Does not require time windows when joining streams.
- Does not require any kind of partitioning.
Instead, JOIN
s work over the available history of both streams, which
ultimately provides an experience more similar to an RDBMS than other streaming
platforms.
LATERAL
subqueries
New in v0.4.1.
To permit subqueries on the right-hand side of a JOIN
to access the columns
defined by the left-hand side, declare the subquery as LATERAL
. Normally, a
subquery only has access to the columns within its own context.
Table function invocations always have implicit access to the columns defined by
the left-hand side of the join, so declaring them as LATERAL
is a permitted
no-op.
When a join contains a LATERAL
cross-reference, the right-hand relation is
recomputed for each row in the left-hand relation, then joined to the
left-hand row according to the usual rules of the selected join type.
LATERAL
subqueries can be very expensive to compute. For best results, do not
materialize a view containing a LATERAL
subquery without first inspecting the
plan via the EXPLAIN
statement. In many common patterns
involving LATERAL
joins, Materialize can optimize away the join entirely.
As a simple example, the following query uses LATERAL
to count from 1 to x
for all the values of x
in xs
.
SELECT * FROM
(VALUES (1), (3)) xs (x)
CROSS JOIN LATERAL generate_series(1, x) y;
x | y
---+---
1 | 1
3 | 1
3 | 2
3 | 3
For a real-world example of a LATERAL
subquery, see the Top-K by group
idiom.
Examples
For these examples, we’ll use a small data set:
Employees
id | name
----+--------
1 | Frank
2 | Arjun
3 | Nikhil
4 | Cuong
Managers
id | name | manages
----+-------+---------
1 | Arjun | 4
2 | Cuong | 3
3 | Frank |
In this table:
Arjun
andFrank
do not have managers.Frank
is a manager but has no reports.
Inner join
Inner joins return all tuples from both tables where the join condition is valid.
SELECT
employees."name" AS employee,
managers."name" AS manager
FROM employees
INNER JOIN managers ON employees.id = managers.manages;
employee | manager
----------+---------
Cuong | Arjun
Nikhil | Cuong
Left outer join
Left outer joins (also known as left joins) return all tuples from the
left-hand-side table, and all tuples from the right-hand-side table that match
the join condition. Tuples on from the left-hand table that are not joined with
a tuple from the right-hand table contain NULL
wherever the right-hand table
is referenced.
SELECT
employees."name" AS employee,
managers."name" AS manager
FROM employees
LEFT OUTER JOIN managers ON employees.id = managers.manages;
employee | manager
----------+---------
Cuong | Arjun
Nikhil | Cuong
Arjun |
Frank |
Right outer join
Right outer joins (also known as right joins) are simply the right-hand-side equivalent of left outer joins.
Right outer joins return all tuples from the right-hand-side table, and all
tuples from the left-hand-side table that match the join condition. Tuples on
from the right-hand table that are not joined with a tuple from the left-hand
table contain NULL
wherever the left-hand table is referenced.
SELECT
employees."name" AS employee,
managers."name" AS manager
FROM employees
RIGHT OUTER JOIN managers ON employees.id = managers.manages;
employee | manager
----------+---------
Cuong | Arjun
Nikhil | Cuong
| Frank
Full outer join
Full outer joins perform both a left outer join and a right outer join. They return all tuples from both tables, and join them together where the join conditions are met.
Tuples that are not joined with the other table contain NULL
wherever the
other table is referenced.
SELECT
employees."name" AS employee,
managers."name" AS manager
FROM employees
FULL OUTER JOIN managers ON employees.id = managers.manages;
employee | manager
----------+---------
Cuong | Arjun
Nikhil | Cuong
| Frank
Arjun |
Frank |
Cross join
Cross joins return the Cartesian product of the two tables, i.e. all combinations of tuples from the left-hand table combined with tuples from the right-hand table.
Our example dataset doesn’t have a meaningful cross-join query, but the above diagram shows how cross joins form the Cartesian product.