Superscript's Real-Time Analytics & ML | Materialize

Superscript

Summary

Superscript uses Materialize to serve real-time customer data models powering online ML predictors and operational dashboards that help optimize the insurance buying process.

By taking SQL pipelines prototyped in a warehouse and operationalizing them on Materialize, the data team is directly responsible for generating significant bottom-line revenue.

Every week, we’re closing customers that we wouldn’t even have had the opportunity to contact before using Materialize.
— Tom Cooper

About Superscript

Since its inception, Superscript has redefined insurance for the digital age. The London-based firm specializes in bespoke, flexible policies for tech-savvy SMEs,start-ups and scale-ups.

Superscript distinguishes itself through its innovative use of technology and data analytics, enabling flexible, accurate coverage. This agile approach has attracted venture capital and facilitated rapid expansion across the UK.

In a short span, Superscript has earned accolades and amassed an impressive client base, paving the way for a more dynamic and personalized future in business insurance.

Tom Cooper leads the Data Team responsible for delivering Superscript’s market-leading analytics capabilities. The team serves as a blueprint for future data teams elsewhere: their mandate spans well beyond internal reporting and into closed-loop optimization of critical customer experiences.

The Challenge

The Superscript Data Team needed a way to take a user data model (a feature store) built on static data in a warehouse, run it on live data in production, and wire it up to multiple ML services used to optimize the conversion funnel. And, they needed to do it without the help of an engineering team, retaining the same level of control, consistency and accuracy they have in a batch pipeline.

The team is responsible for taking all user-centric data available, and putting it into action with the goal of creating a better buying process, increasing conversions, and ultimately growing revenue.

Superscript had already seen promising results using proof-of-concept tests to do:

  • Real-Time Scoring - While in the buying process on the site, continually funnel behavioral features to a ML model that flags specific prospects that needed immediate outreach from the customer success team to convert.
  • Live Operational Dashboards - Give internal teams a live status of operations to create shorter feedback loops.

But they needed end-to-end latency to be on the order of seconds, not hours, in order to run the process in production.

Before Materialize

Like any other business, Superscript has access to valuable streams of data created in real-time:

  • Primary data - Customer-provided information and past interactions with Superscript
  • Behavioral data - Actions the user is taking on the Superscript site, pages they’re visiting, etc…
  • 3rd Party data - Business enrichment information

They were routing live data through a real-time customer data platform, Segment. But, their options for transforming and modeling the data into the form they needed came with a set of untenable compromises…

Options for modeling data downstream of Segment:

  1. Warehouse - Limited by frequency of updates. They started here as an accessible and easy (SQL) place to test and prototype, but Segment only syncs data to warehouses twice a day. Enterprise plans allow for hourly syncs, but even five minute latency means losing the most valuable (recent) behavioral data.
  2. Black-Box SaaS Tools - Limited control. SaaS tools can receive events from Segment to give marketers a point-and-click interface for capabilities like website personalization, but they don’t provide the general-purpose modeling features and controls that the team needed.
  3. Stream Processors - Limited capabilities, added complexity. The team explored using Faust, but were unable to get it to produce results that worked for their architecture. They also looked at ksqlDB, but using it meant rewriting all the SQL they wrote in the warehouse, and the costs to handle their joins and multi-step transformations would have been prohibitively expensive.

Materialize Implementation

Upon getting started with Materialize, Tom and his team were able to securely set up a proof of concept using Kafka as a message buffer between Segment and Materialize in one evening: “We decided to look into Materialize, and by that evening we were up and running. And it wasn’t long before we started to see how it could be used in Production.”

Try Materialize Free

Get hands-on with Materialize in a 14-day Free Trial. Bring your own data, or use data sources we provide.

Get Access

The initial architecture has proven reliable and scalable enough to remain unchanged in production:

Superscript Materialize Architecture

Getting data in

Segment collects and routes customer-centric data from websites, server-side applications and third-party platforms. Superscript uses a basic HTTP Destination in Segment (a webhook) to route the raw firehose of all JSON-formatted events to a managed Kafka provider that’s used purely as a message buffer (for general guidance on this, see how to connect Segment to Materialize) and Materialize eagerly consumes new messages from Kafka via a Source.

Modeling

PostgreSQL JSON syntax is used to extract relevant fields from raw JSON into a structured database schema and transform the data through several layers of SQL materialized views to serve clean, consistent and up-to-date user dimension tables and feature stores. See creating a real-time feature store for generalized guidance on this approach.

Serving

Materialize also acts as the serving layer in this architecture. Materialize is Postgres wire-compatible, so Superscript can query Materialize as if it were a Postgres database using stable drivers like psycopg2 for Python and node-postgres for Node.js.

Important tables are maintained in memory (via Indexes) to speed up response times. Superscript can do user id lookups on indexed tables and get responses with ~50ms latency. The team reads data out of Materialize in two ways:

Fetching Data with standard SQL queries:

Real-time operational dashboards:

  1. Materialized Views are created to incrementally track key conversion stats with up-to-the-second accuracy.
  2. When an internal user loads the admin dashboard, it fetches the latest data via standard SELECT queries.
  3. Responses are fast because no computation is done on query.
  4. End-to-end latency from when data originates in platform to data reflected in dashboard is 1-5 seconds.

Streaming data with SUBSCRIBE

The real-time scoring service subscribes to updates on the prospect’s data from a feature table, and continually recomputes a “risk” variable as they are active on the site. The moment it hits a certain threshold, their information is added to a priority queue for personal outreach.

Results

Ultimately, the most meaningful result of the Superscript team’s new real-time capabilities was a measurable increase in bottom-line revenue. Tom put it best: “Every week, we’re closing customers that we wouldn’t even have had the opportunity to contact before using Materialize”

But also important was the fact that the entire operational data stack was handled by the Data Team. This empowered the team to build exactly what they needed and iterate quickly. The success of the project wasn’t dependent on implementation or ops work from other engineering teams. With a level of work similar to that required to serve an internal BI tool, they could deliver a production-ready API with a direct and measurable improvement to bottom-line revenue.

Try Materialize Free

Get hands-on with Materialize in a 14-day Free Trial. Bring your own data, or use data sources we provide.

Get Access

Try Materialize Free