Search Is How Agents See the World

June 29, 2026

Seth Wiesman

Field CTO

Search maintenance used to be a UX problem. For agents, it is a correctness problem.

Large language models reason in language. So the natural way for an agent to ask the world for information is in words: describe the thing it needs, then search for it.

Keyword search and vector search are both versions of this. Keyword search lets the agent ask by exact terms: this order ID, this SKU, this customer name. Vector search lets the agent ask by meaning: products like this, tickets related to this failure mode, policies relevant to this request.

That makes search the agent’s interface to the world.

If search results are stale, humans can compensate. If a result looks wrong, they can refresh the page, check the source system, ask a coworker, or use judgment. Stale search may be frustrating, but the human still has ways to recover.

An autonomous agent operates in a tighter loop. It uses search as its eyes into the world: it looks up context, decides what to do, takes an action, and then searches again to observe what changed. That final search is important because it enables the agent to reason about the consequences of its action. Is it closer to its goal? Or should it change course? Search is how it finds out.

That loop only works if search reflects the current state of the world. If the agent updates a price, changes an order, opens a ticket, grants a permission, or writes a row, it needs to see the consequence of that action before it can decide what to do next.

If it cannot see the consequence, it has bad options. It can stall because it cannot confirm progress. It can retry and risk doing the same thing twice. Or it can infer that something happened because it expected it to happen, not because it observed it.

For a human, stale search is often annoying. For an agent, it's game breaking.

The agent sees computed entities

Imagine an agent that helps manage a commerce catalog.

It can place orders, adjust pricing based on demand or policy, resolve inventory issues, reorganize products into collections, and take corrective actions without human intervention. To do that work, it searches the catalog by exact terms and by meaning. It might look up a product by SKU, or it might search for “cozy gear for a snowy cabin trip.”

The agent is not searching raw tables. It is searching product-shaped documents.

{ 
 "id": "product_123", 
 "name": "Cast Iron Camp Skillet", 
 "description": "A durable skillet for open-fire cooking.", 
 "brand": "North Trail", 
 "price": 42, 
 "in_stock": true, 
 "avg_rating": 4.7, 
 "review_count": 128, 
 "collections": "Winter Cabin Getaway: warm, rugged gear for snowy trips", 
 "search_blob": "Cast Iron Camp Skillet A durable skillet for open-fire cooking. Winter Cabin Getaway: warm, rugged gear for snowy trips", 
 "embedding": [0.12, -0.08, 0.44, "..."] 
} 
 json

This document is what the agent sees. It may look simple, but it is the product of a lot of computation.

Search documents are usually built around derived core entities: a product, customer, order, ticket, policy, or account in the shape an application or agent needs to retrieve. Each document gathers the fields that describe the entity, but it also resolves the logic that makes those fields meaningful.

For a product, avg_rating is computed from reviews. price may reflect base price, promotions, customer segment rules, region, contract terms, and availability. in_stock may depend on warehouse inventory, reservations, supplier feeds, and fulfillment rules. collections may combine merchandising data with product relationships. search_blob is intentionally composed from the fields that should affect semantic retrieval. The embedding is then derived from that text.

The result is a computed entity: a product-shaped record assembled from operational data and business logic.

That shape is right for search. Elastic’s own performance guidance says documents should be modeled to make search-time operations cheap, and that avoiding joins by denormalizing documents can produce significant speedups. Search systems are fast because they search this kind of precomputed shape.

But those documents only look simple because the computation has already happened. The hard part is keeping each computed entity correct as the underlying systems change.

A useful mental model looks like this:

products ───────┐
brands ─────────┤
inventory ──────┤
reviews ────────┤──> product_doc ───────────────> keyword index
collections ────┘        │
                         └──> embedding text ──> vector store

text

The index is not the source of truth. The source systems are. But the thing search needs is the computed entity: the final product-shaped record after joins, aggregates, rollups, pricing logic, permissions, and embedding text have all been resolved.

That is where search maintenance becomes hard.

The hard part is maintaining the computed entity

Putting a document into a search index is not the hard part. The hard part is keeping the right computed entity current when the data and business logic behind it change. And that's because the thing that changed is not always the thing you index.

Search indexes store product documents, while operational systems update catalog rows, inventory, reviews, collections, permissions, and more. The challenge is translating those source-level changes into updates to the computed entity.

A single change can ripple through the system in different ways. Updating a description affects one product’s text and embedding. Changing a price rule may update prices across many products without touching embeddings. A new review shifts a product’s rating, while a collection edit can update search text for every product it contains. Some changes are localized; others fan out across large portions of the dataset.

These fan-out cases are the hardest. A collection description change can require re-embedding every associated product. A permission update can alter which documents are visible without modifying any product rows. Pricing logic changes may depend on region, customer segment, inventory, or promotions, affecting many entities at once.

What matters is not where the change originates, but how it affects the computed entity. This makes the problem one of view maintenance, not simple synchronization.

CDC captures input-level changes: a row updated in collections, a new review, an inventory adjustment. But search needs to understand the output-level impact. Which product documents changed? Which fields within them? Did the embedding text change, or only a filterable attribute?

CDC tells you what happened to an input. It can tell you that a row changed in collections, or that a row changed in reviews, or that an inventory record was updated. But the search system needs to know the output-level consequence.

Which product documents changed? Which fields changed inside those documents? Did the text used for embedding change? Did only a filterable attribute change? Should the vector be regenerated, preserved, or deleted?

Those are entity-level questions.

This is where most approaches start to break down. The system has to translate low-level source changes into precise updates to computed entities, and that translation is where complexity accumulates.

Batch and DIY streaming both fall short

One common approach is to periodically recompute the index in batch. Reprocess large swaths of data, doing more work than necessary but guaranteeing you process anything that may have changed. They eventually produce the correct index, but introduce lag between the operational system and what search reflects.

The natural next step is to try to make things more real-time by building a pipeline by hand, whether that’s a streaming system or a set of microservices reacting to changes.

That sounds better. Source changes flow through CDC or APIs. Services pick up updates. Workers patch the index. Embedding jobs refresh vectors. With enough code, the index can become fresher.

But now the application owns the hard part.

It has to join source changes into computed entities. It has to maintain aggregates. It has to apply pricing logic. It has to handle fan-out. It has to decide which document fields changed. It has to know when to call the embedding model and when to preserve the existing vector. It has to handle deletes, retries, ordering, schema changes, backfills, and operational failures.

In other words, the team has built a custom view-maintenance system.

That system is usually brittle because the logic is spread across services. One service knows about source events. Another knows about entity assembly. Another knows about pricing or permissions. Another knows about embeddings. Another writes to the index. The rules that define “what the agent sees” are no longer in one place.

This is why batch and raw CDC both miss the core problem.

Batch is too stale. Raw CDC is too low-level. And building your own pipelines means solving a complex problem from scratch. What you need is a way to define the computed entity once, then have the system keep it current as the source data changes.

Define the computed entity once, as SQL

Materialize lets you define the search document as SQL and keep it current as source systems change.

The important idea is simple: the search document should not be a batch artifact or a pile of application code. It should be a maintained view of a computed entity.

Start with the pieces. Reviews become a product-level aggregate:

CREATE VIEW reviews_agg AS 
SELECT 
 product_id, 
 AVG(stars) AS avg_rating, 
 COUNT(*) AS review_count 
FROM reviews 
GROUP BY product_id; 
 sql

Collections become a product-level rollup:

CREATE VIEW collection_text AS 
SELECT 
 pc.product_id, 
 string_agg(c.name || ': ' || c.description, ' | ') AS text 
FROM product_collections pc 
JOIN collections c 
 ON c.collection_id = pc.collection_id 
GROUP BY pc.product_id; 
 sql

Then those pieces become the final product document:

CREATE MATERIALIZED VIEW product_doc AS 
SELECT 
 p.product_id, 
 p.name, 
 p.description, 
 b.name AS brand, 
 inv.price, 
 inv.in_stock, 
 rev.avg_rating, 
 rev.review_count, 
 col.text AS collections, 
 CONCAT_WS(' ', p.name, p.description, col.text) AS search_blob 
FROM products p 
JOIN brands b 
 ON b.brand_id = p.brand_id 
JOIN inventory inv 
 ON inv.product_id = p.product_id 
LEFT JOIN reviews_agg rev 
 ON rev.product_id = p.product_id 
LEFT JOIN collection_text col 
 ON col.product_id = p.product_id; 
 sql

This view is the product document. It joins catalog, brand, inventory, reviews, and collections into one product-shaped record that the agent can search. In a real system, this view might also include pricing rules, permissions, regional availability, personalization features, or any other business logic needed to produce the entity the agent should see.

The important part is not only that the transformation is written in SQL. It is that Materialize maintains the result.

When inputs change, Materialize updates only the affected product_doc rows. A new review updates one product’s rating. A promotion updates prices for the relevant products. A collection edit updates search text for its products.

Downstream systems don’t need to recompute relationships or figure out what changed. Materialize converts source changes into entity-level updates and the search document is continuously maintained.

Send entity-level changes downstream

Once Materialize maintains product_doc, the search index does not need to understand the whole operational graph or how the entity is computed. It only needs to consume changes to the computed entity.

If a price changes, the downstream update can look like this:

{ 
 "before": { 
   "id": "product_123", 
   "price": 42, 
   "search_blob": "Cast Iron Camp Skillet ... Winter Cabin Getaway: warm, rugged gear for snowy trips" 
 }, 
 "after": { 
   "id": "product_123", 
   "price": 39, 
   "search_blob": "Cast Iron Camp Skillet ... Winter Cabin Getaway: warm, rugged gear for snowy trips" 
 } 
} 
 json

The price changed. The embedded text did not. The index should update the price, but it should not call the embedding model.

If the description changes, the update looks different:

{ 
 "before": { 
   "id": "product_123", 
   "price": 39, 
   "search_blob": "Cast Iron Camp Skillet A durable skillet for open-fire cooking..." 
 }, 
 "after": { 
   "id": "product_123", 
   "price": 39, 
   "search_blob": "Cast Iron Camp Skillet A lightweight skillet for open-fire cooking..." 
 } 
} 
 json

Now the embedded text changed. The vector has to change too.

Because Materialize produces full before-and-after images of the computed entity, the downstream pipeline can trivially determine what changed. Materialize has already answered the hard question of which entities changed, so the search system only needs to answer the local question of what fields changed inside this entity.

That distinction keeps the architecture simple. The search system does not need to understand joins, aggregates, pricing logic, or fan-out relationships. It only reacts to changes in the final entity shape.

It also enables precise embedding updates. The pipeline can compare the fields that feed the embedding and act accordingly:

If price changed, update the price.
If in_stock changed, update availability.
If avg_rating changed, update the rating.
If search_blob changed, regenerate the embedding.
If nothing relevant changed, do nothing.

At this point, the remaining problem is mechanical: compare before and after, and only re-embed when the text that drives meaning actually changed.

We already ship this as a small, focused piece of infrastructure. Perfect Embedding is a Kafka Connect Single Message Transform (SMT) that sits in the pipeline between Materialize and your search index. It inspects the before-and-after images of each document, checks the fields you’ve configured as embedding inputs, and only calls the embedding service when those fields change.

Out of the box, it integrates with OpenAI-compatible embedding APIs, but the interface is pluggable so you can use whatever embedding provider you prefer.

Build search your agents can trust

Agents do not need a stale copy of operational data. They need a maintained interface to your business.

That interface is the search document: the product, ticket, customer, policy, or order shape the agent retrieves before it acts. If that document lags behind the source system, the agent reasons from a stale model. If the document stays current, the agent can observe the result of its actions and keep going.

Materialize maintains that document as SQL. It turns source-level changes into entity-level changes, including joins, aggregates, rollups, fan-out updates, business logic, and deletes. The search index receives the changes it needs. Perfect Embedding ensures embeddings stay in sync without unnecessary recomputation.

The result is a search layer that moves with the operational system instead of catching up to it later.

Materialize has built a solution around these ideas, making it easy to define, maintain, and serve real-time computed entities for search and beyond. If you're building agents, or any system that depends on fresh, correct search, check out our website to see how it works and get started.