Doing business with recursive SQL

Let's take a look at a fundamental problem in economics, with applications to doing business: matching up producers and consumers of some abstract resource, in a way that appeals to all of the participants.
Imagine we have a set of producers and a set of consumers, each of whom wants to be matched to one member of the opposite type, and each of them has some (not necessarily shared) preference for the other. The problem was initially presented in the language of "stable marriage", but it applies to any pairings where the participants have opinions about those they might be paired with. The framing has also been applied to matching hospital residents with hospitals, application clients with server capacity, and in this post hungry engineers and their lunching options. You should be able to apply it to a variety of settings, most fruitfully when the matched things come with a rich variety of opinions about each other.
To spill the beans, there already is an algorithm for stable matching, and we're just going to implement it in recursive SQL. You might not have thought of SQL as a language for algorithms, and conventional SQL is certainly very limited in this respect. However, recursive SQL can be a great fit, and when it is there's no reason not to just lean on the existing approaches!
Stable matching in SQL
We will work off of a table prefs that will store the mutual preferences between pairs of producer and consumer. Not every pair needs to be represented here, and any pairs that are absent will just be taken to be non-viable. We'll call producers and consumers by name1 and name2, respectively, which aren't very evocative but are easier to type. Each pair will have integer preferences pref1 and pref2 for each other, where smaller numbers mean higher preference (imaging them as a ranking).
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
Our goal is to pull out a subset of prefs where each name1 and name2 occur at most once. Also, we shouldn't leave behind any pairing in which each prefers the other more than the pair they were assigned. That second part is where the algorithm comes in.
Of course, we'll want some example preferences to work with. Let's start with some hungry engineers and food options. Thematically, let's imagine that each human prefers the foods based on their own unaccountable tastes, and the food options (restaurants) prefer the humans based on their distance (because each's price doesn't vary as a function of the human, but the delivery cost does).
Here's some made up data that will show off what we are trying to do.
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
If we study the data (and the comments) we will find that one stable matching is
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
Nikhil doesn't get lunch in this story, which is too bad, but is a demonstration of the constraints: not everyone gets what they want. Arjun also doesn't get what he wants, which is ramen, because it isn't stable: the ramen-ya would just hit Frank up and they'd do lunch instead. It turns out there aren't other stable matchings for this data, but in general there can be many.
How do we arrive at a stable matching? Fortunately, way back in 1962, Gale and Shapley proposed an algorithm to do just that. In one variant: each producer proposes to satisfy their favorite consumer, each consumer definitively rejects all but the best proposal, and spurned proposers repeat with their next best options, until the rejections stop or they run out of options.
It's pretty much recursion, isn't it? And moreover, each of the steps are pretty easy SQL. Let's write them down!
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
27 | |
28 | |
29 | |
30 | |
31 | |
Each of these steps--proposal, tentative acceptance, and rejection--follow the written description up above. The behavior of the WITH MUTUALLY RECURSIVE block is to evaluate each term in order, then repeat from the top, until they stop changing. It's worth a moment reading and maybe re-reading the SQL to convince yourself that there is at least some relationship to the written plan.
If we run the query, we get the result up above.
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
These results are great to see, but we are here to maintain computation, as input data change. We can also SUBSCRIBE to the query, and then modify the input to see some output changes.
Each subscribe starts with a snapshot, and it should be (and is) the answer just up above.
1 | |
2 | |
3 | |
To remind you, or introduce you, SUBSCRIBE produces output whose first column is the timestamp of some update event, followed by a change in count (here 1 for both records), followed by payload columns matching what you'd see from a SELECT query.
At this point, let's introduce the possibility that Frank would happily eat a sandwich instead of ramen.
1 | |
2 | |
As soon as I press enter, a bunch of changes spill out of the subscription:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
How do we read this? Arjun has a shuffle where he gains a matching with ramen and yields his sushi seat. Frank switches to a sandwich from ramen. And Nikhil gets lunch! Sushi isn't happy about it, mind you, but lunch occurs for all producers and consumers.
Importantly, there is one timestamp (1702997625810), indicating that all five changes happen atomically, at exactly the same moment. Neither producer nor consumer will be over-committed, even for a moment, on account of Materialize doesn't screw around with consistency and correctness.
Generalizing stable matching
Let's imagine that each restaurant can serve more than one person, and instead has an integer "capacity". What do we need to change about our process? Let's introduce tables producer_capacity and consumer_capacity, which each hold a name and an integer capacity.
1 | |
2 | |
3 | |
4 | |
What we need to tweak about the algorithm is that each producer proposes at their top cap opportunities, and each consumer tentatively accepts their top cap proposals.
Where above we have fragments that look like so, to pick the top singular opportunity,
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
we'll want to update these to pick the top cap opportunities:
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
This new SQL is a bit more complicated than the old SQL, but the LATERAL join allows us to invoke LIMIT with an argument that depends on cap rather than a limit of exactly one that DISTINCT ON provides.
We'll need to do the same thing for our tentative accepts, using consumer_capacity.
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
With unit capacities we'll see the same results as before. However, let's introduce Nikhil to ramen, which it turns out he likes.
1 | |
2 | |
This has some immediate consequences for our subscription to the matching. I restarted it because we need to pick up the new query with capacities, but the new snapshot put us right back where we were before.
1 | |
2 | |
3 | |
4 | |
5 | |
This dislodges Arjun, who is now back on the sushi plan, because the ramen folks are fully occupied. But only because they are occupied. Let's update their capacity to two, which should give Arjun a seat.
1 | |
2 | |
1 | |
2 | |
3 | |
And, to rattle things a bit more let's imagine the sandwich shop is sold out and their capacity drops down to zero.
1 | |
2 | |
1 | |
2 | |
3 | |
4 | |
5 | |
Poor Arjun is just getting bounced around. He decides he really wants some ramen, and offers a cash incentive which updates their preference for him dramatically. We'll model this by just tweaking their preference directly.
1 | |
2 | |
1 | |
2 | |
3 | |
4 | |
5 | |
And Arjun is back on ramen and Nikhil is back on sushi.
Recursive SQL and doing business
There are lots of changes the input may experience, many of which lead to changed output. Like in life, the world changes around you and you may need to promptly update your plans for the world. Materialize and recursive SQL are here to make sure you are always looking at the correct output, moment by moment.
We've seen an example of using SQL for one problem that is fundamental in economics: stable matching (with capacities). This certainly isn't the only problem in economics, nor even the most significant business problem you'll have, but it does show off a potentially new use of recursive SQL to solve the problem. Other problems, similar and different, have natural solutions with recursive SQL that you might not have imagined, and you wouldn't be able to access with vanilla SQL.
