What's a live data product?

Data teams have worked with data products for years, but the concept has traditionally meant something static. A quarterly sales report, a customer segmentation analysis, or a monthly dashboard—these are all data products in the conventional sense. They package data in a useful format, but they represent a snapshot in time.

Live data products work differently. They maintain an up-to-date view of your data as changes happen, not after a scheduled batch job runs. When a customer places an order, updates their profile, or a sensor reading changes, live data products reflect that information within seconds. The concept centers on continuous computation rather than periodic recalculation.

How live data products differ from traditional approaches

Traditional data products follow a batch processing model. Data warehouses pull information from source systems on a schedule, transform it, and make it available for queries. This works for historical analysis and reporting, but creates problems for operational use cases.

Consider a fraud detection system. A batch-processed data product might update every few minutes or hours. During that window, fraudulent transactions can go undetected. A live data product processes each transaction as it occurs, applying the same complex logic and joins you'd use in a warehouse, but maintaining results continuously.

The technical difference comes down to computation models. Batch systems recalculate results from scratch each time they run. Live data products use incremental computation. They determine the minimal work needed to update results when source data changes. This approach originated in academic research on dataflow systems like Timely and Differential Dataflow.

Core characteristics

Live data products share several defining features:

Always current: Results stay synchronized with source systems. When data changes in a database or arrives in an event stream, the data product updates automatically without manual intervention or scheduled jobs.

Strongly consistent: Live data products guarantee correctness across multiple data sources within their ingestion context. This differs from eventually consistent systems where you might read stale or conflicting data. The consistency model matters when joining data from separate databases or combining current and historical information.

Queryable and subscribable: You can pull data from live data products using standard database queries, or subscribe to updates as they happen. This flexibility supports different consumption patterns. A dashboard might query current state, while a microservice subscribes to changes.

Composable: Live data products can be built from other live data products. A "customer lifetime value" data product might combine data from "customer orders" and "customer support interactions" products, each pulling from different source systems.

Building live data products with SQL

One popular approach to building live data products involves defining views using SQL. This makes them accessible to data engineers who already know SQL, without requiring expertise in stream processing frameworks.

Here's what the process looks like:

  • Connect to source systems through change data capture for operational databases, event streams from Kafka, or other data sources
  • Define transformations using SQL queries that join, filter, and aggregate data
  • Materialize views so results update incrementally as source data changes
  • Expose results via queries or push updates downstream

The SQL definitions describe what you want to compute, while the underlying engine handles how to maintain results efficiently. This separation means you can express complex multi-way joins and aggregations without worrying about the mechanics of incremental computation.

Use cases across different domains

Live data products support several operational patterns that batch processing can't handle effectively.

AI agent context

AI agents need current information about business state to make decisions and take actions. Instead of giving agents direct access to raw database tables where they might run expensive queries, live data products provide semantic representations of business concepts. An agent working with customer data might access a "customer profile" data product that combines information from CRM systems, purchase history, and support tickets. As underlying data changes, the product updates, and agents see the current state.

The Model Context Protocol (MCP) makes live data products particularly useful for AI workflows. Teams can expose data products as MCP endpoints, giving agents discoverable, well-defined interfaces to business data.

Event-driven architectures

Microservices architectures often struggle with data consistency across services. Each service maintains its own database, but services need to react to changes in other services' data. Live data products can transform raw database changes into semantically meaningful business events.

A "customer subscription status" data product might combine data from billing, feature usage, and entitlements services. When the combined view indicates a subscription should be downgraded, the system can emit an event that triggers downstream processes without requiring each service to implement its own coordination logic.

Data-intensive user interfaces

Applications with complex, data-heavy UIs need to show aggregated data from multiple sources with minimal latency. Traditional approaches involve caching layers that require careful invalidation logic, or they accept stale data.

Live data products maintain pre-computed results that applications can query directly. A financial trading platform might use data products to show portfolio positions that aggregate data from multiple accounts and asset types. As trades execute, positions update in sub-second timeframes without the application needing to implement its own aggregation logic.

Common implementation patterns

Organizations adopt live data products through several architectural patterns, each suited to different situations.

Query offload (CQRS): Complex read queries that strain operational databases move to a live data layer. This scales read workloads separately from write workloads without the complexity of cache invalidation. The read model stays synchronized with source databases through change data capture.

Operational data store: Data from multiple source systems gets combined and transformed into unified views. Unlike traditional ETL that runs on schedules, these views update as source data changes. Teams can query integrated data directly or push it to downstream systems.

Operational data mesh: Different teams build and maintain their own data products while sharing them across the organization. A payments team might publish a "transaction status" data product, while a risk team publishes a "fraud indicators" product. Other teams can compose these products together to build higher-level views without duplicating logic.

Technical considerations

Implementing live data products requires thinking about several technical factors that don't come up with batch processing.

State management becomes important because incremental computation needs to maintain working state as it processes updates. For large datasets, this state might not fit in memory, requiring careful management of what to keep in memory versus what to persist to disk.

Late-arriving and out-of-order data requires special handling. Events don't always arrive in the order they occurred, and systems need to maintain correctness even when processing events from the past that affect current results.

Failure recovery needs to work differently than batch systems. When a batch job fails, you restart it. When a live data product fails, you need to recover to a consistent state without losing updates that arrived during the outage.

Examples from production

Several companies have deployed live data products for different use cases. Delphi uses them to power agent queries with complex transformations at scale. Vontive reduced loan eligibility calculation time from 27 seconds to half a second. Neo Financial cut infrastructure costs by 80 percent for their fraud detection system. Nanit built an operational data mesh that lets different teams share data products while maintaining loose coupling between services.

These implementations share a pattern: they took workloads that were either too complex for batch processing or too expensive to implement with custom stream processing, and made them practical using SQL-defined live data products.

The shift from batch to continuous

Live data products represent a different way of thinking about data infrastructure. Instead of scheduling transformations to run periodically, you define what you want to compute and let the system maintain results continuously. This aligns with how operational systems work. They process events as they occur rather than waiting for a scheduled time.

The technology has matured to where teams can implement live data products without specialized stream processing expertise. Using familiar SQL and database concepts, engineers can build systems that were previously only possible with significant custom development.

Frequently asked questions

How is a live data product different from a cache?

Caches store query results and require invalidation logic to stay current. When underlying data changes, you need to decide which cache entries to invalidate and when. Live data products eliminate this complexity by automatically updating as source data changes. They use incremental computation to maintain correctness without manual invalidation.

Do I need to learn stream processing to build live data products?

No. Live data products use SQL, the same language you'd use with a traditional database. You write queries that describe what you want to compute, and the system handles the incremental updates. This differs from stream processing frameworks that require you to think about windowing, state management, and low-level dataflow operations.

Can live data products replace my data warehouse?

Live data products serve operational workloads, not the same use cases as data warehouses. Warehouses excel at historical analysis, complex ad-hoc queries, and BI reporting on large datasets. Live data products excel at maintaining current views for applications, AI agents, and operational processes. Many organizations use both, with live data products serving operational needs and warehouses handling historical analysis.

What happens when source data arrives out of order?

Live data products maintain correctness even with late-arriving data. The system processes events according to their logical timestamp rather than arrival time, recalculating affected results when needed. This differs from systems that produce approximate results or require you to define time windows that might miss late data.

How much does it cost to run live data products?

Infrastructure costs depend on data volume and query complexity. Incremental computation performs minimal work to update results, which keeps costs lower than recalculating from scratch. For specific workloads, companies have reported significant cost reductions compared to custom implementations. Neo Financial reduced costs by 80 percent for their fraud detection system.

When should I use live data products instead of batch processing?

Use live data products when you need current data for operational decisions. Fraud detection, inventory management, AI agent context, and user-facing features benefit from continuous updates. Batch processing works fine for historical reporting, compliance documents, and analysis where data from yesterday or last week is acceptable.

What tools and languages work with live data products?

Live data products use the PostgreSQL wire protocol, so any tool that connects to PostgreSQL can work with them. This includes BI tools, ORMs, database clients, and programming language drivers. You can also integrate with dbt for transformation workflows.

How do live data products handle failures?

Systems that support live data products typically provide automatic recovery with strong consistency guarantees. When a failure occurs, the system recovers to a consistent state without losing updates or serving incorrect results. This differs from eventually consistent systems where failures can lead to temporary data inconsistencies.