CREATE SOURCE: Load generator

CREATE SOURCE connects Materialize to an external system you want to read data from, and provides details about how to decode and interpret that data.

Load generator sources produce synthetic data for use in demos and performance tests.

Syntax

CREATE SOURCE IF NOT EXISTS src_name IN CLUSTER cluster_name FROM LOAD GENERATOR AUCTION CLOCK COUNTER MARKETING TPCH KEY VALUE ( load_generator_option , ) FOR ALL TABLES EXPOSE PROGRESS AS progress_subsource_name with_options

load_generator_option

TICK INTERVAL interval AS OF UP TO tick SCALE FACTOR scale_factor MAX CARDINALITY max_cardinality KEYS keys SNAPSHOT ROUNDS snapshot_rounds TRANSACTIONAL SNAPSHOT transactional_snapshot VALUE SIZE value_size SEED seed PARTITIONS partitions BATCH SIZE batch_size

with_options

WITH ( RETAIN HISTORY = FOR retention_period )
Field Use
src_name The name for the source.
IN CLUSTER cluster_name The cluster to maintain this source.
COUNTER Use the counter load generator.
AUCTION Use the auction load generator.
MARKETING Use the marketing load generator.
TPCH Use the tpch load generator.
IF NOT EXISTS Do nothing (except issuing a notice) if a source with the same name already exists.
TICK INTERVAL The interval at which the next datum should be emitted. Defaults to one second.
AS OF The tick at which to start producing data. Defaults to 0.
UP TO The tick before which to stop producing data. Defaults to infinite.
SCALE FACTOR The scale factor for the TPCH generator. Defaults to 0.01 (~ 10MB).
MAX CARDINALITY Valid for the COUNTER generator. Causes the generator to delete old values to keep the collection at most a given size. Defaults to unlimited.
KEYS Valid for KEY VALUE generator.
SNAPSHOT ROUNDS Valid for KEY VALUE generator.
TRANSACTIONAL SNAPSHOT Valid for KEY VALUE generator.
VALUE SIZE Valid for KEY VALUE generator.
SEED Valid for KEY VALUE generator.
PARTITIONS Valid for KEY VALUE generator.
BATCH SIZE Valid for KEY VALUE generator.
FOR ALL TABLES Creates subsources for all tables in the load generator.
EXPOSE PROGRESS AS progress_subsource_name The name of the progress subsource for the source. If this is not specified, the subsource will be named <src_name>_progress. For more information, see Monitoring source progress.
RETAIN HISTORY FOR
retention_period
Private preview. This option has known performance or stability issues and is under active development. Duration for which Materialize retains historical data, which is useful to implement durable subscriptions. Accepts positive interval values (e.g. '1hr'). Default: 1s.

Description

Materialize has several built-in load generators, which provide a quick way to get up and running with no external dependencies before plugging in your own data sources. If you would like to see an additional load generator, please submit a feature request.

Counter

The counter load generator produces the sequence 1, 2, 3, …. Each tick interval, the next number in the sequence is emitted.

Auction

The auction load generator simulates an auction house, where users are bidding on an ongoing series of auctions. The auction source will be automatically demuxed into multiple subsources when the CREATE SOURCE command is executed. This will create the following subsources:

  • organizations describes the organizations known to the auction house.

    Field Type Description
    id bigint A unique identifier for the organization.
    name text The organization’s name.
  • users describes the users that belong to each organization.

    Field Type Description
    id bigint A unique identifier for the user.
    org_id bigint The identifier of the organization to which the user belongs. References organizations.id.
    name text The user’s name.
  • accounts describes the account associated with each organization.

    Field Type Description
    id bigint A unique identifier for the account.
    org_id bigint The identifier of the organization to which the account belongs. References organizations.id.
    balance bigint The balance of the account in dollars.
  • auctions describes all past and ongoing auctions.

    Field Type Description
    id bigint A unique identifier for the auction.
    seller bigint The identifier of the user selling the item. References users.id.
    item text The name of the item being sold.
    end_time timestamp with time zone The time at which the auction closes.
  • bids describes the bids placed in each auction.

    Field Type Description
    id bigint A unique identifier for the bid.
    buyer bigint The identifier vof the user placing the bid. References users.id.
    auction_id bigint The identifier of the auction in which the bid is placed. References auctions.id.
    amount bigint The bid amount in dollars.
    bid_time timestamp with time zone The time at which the bid was placed.

The organizations, users, and accounts are fixed at the time the source is created. Each tick interval, either a new auction is started, or a new bid is placed in the currently ongoing auction.

Clock

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

To enable this feature in your Materialize region, contact our team.

The clock load generator tracks the system clock time, and can be used in contexts where the now() function cannot. On each tick interval, the source emits the system clock time. For example, configuring this load generator with TICK INTERVAL '1 minute' will cause the source to update every minute.

Field Type Description
time timestamp with time zone The system clock time.

Marketing

The marketing load generator simulates a marketing organization that is using a machine learning model to send coupons to potential leads. The marketing source will be automatically demuxed into multiple subsources when the CREATE SOURCE command is executed. This will create the following subsources:

  • customers describes the customers that the marketing team may target.

    Field Type Description
    id bigint A unique identifier for the customer.
    email text The customer’s email.
    income bigint The customer’s income in pennies.
  • impressions describes online ads that have been seen by a customer.

    Field Type Description
    id bigint A unique identifier for the impression.
    customer_id bigint The identifier of the customer that saw the ad. References customers.id.
    impression_time timestamp with time zone The time at which the ad was seen.
  • clicks describes clicks of ads.

    Field Type Description
    impression_id bigint The identifier of the impression that was clicked. References impressions.id.
    click_time timestamp with time zone The time at which the impression was clicked.
  • leads describes a potential lead for a purchase.

    Field Type Description
    id bigint A unique identifier for the lead.
    customer_id bigint The identifier of the customer we’d like to convert. References customers.id.
    created_at timestamp with time zone The time at which the lead was created.
    converted_at timestamp with time zone The time at which the lead was converted.
    conversion_amount bigint The amount the lead converted for in pennies.
  • coupons describes coupons given to leads.

    Field Type Description
    id bigint A unique identifier for the coupon.
    lead_id bigint The identifier of the lead we’re attempting to convert. References leads.id.
    created_at timestamp with time zone The time at which the coupon was created.
    amount bigint The amount the coupon is for in pennies.
  • conversion_predictions describes the predictions made by a highly sophisticated machine learning model.

    Field Type Description
    lead_id bigint The identifier of the lead we’re attempting to convert. References leads.id.
    experiment_bucket text Whether the lead is a control or experiment.
    created_at timestamp with time zone The time at which the prediction was made.
    score numeric The predicted likelihood the lead will convert.

TPCH

The TPCH load generator implements the TPC-H benchmark specification. The TPCH source must be used with FOR ALL TABLES, which will create the standard TPCH relations. If TICK INTERVAL is specified, after the initial data load, an order and its lineitems will be changed at this interval. If not specified, the dataset will not change over time.

KEY VALUE

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

To enable this feature in your Materialize region, contact our team.

The KEY VALUE load generator produces keyed data that is intended to be passed though the UPSERT envelope. Its size and performance can be configured in detailed ways.

The schema of the data is:

Field Type Description
key uint8 The key for the value
partition uint8 The partition this key belongs to
value bytea Random data associated with the key.
offset uint8 The offset of the data (if INCLUDE OFFSET is configured).

The following options are supported:

  • KEYS: The number of keys in the source. For now, this must be divisible by PARTITIONS * BATCH SIZE, though this constraint may be lifted in the future.
  • SNAPSHOT ROUNDS: The number of rounds of data (1 update per key in each round) to produce as the source starts up. Can be used to scale the size of the snapshot without changing the number of keys.
  • TRANSACTIONAL SNAPSHOT: Whether or not to emit the snapshot as a singular transaction.
  • VALUE SIZE: The number of bytes in each value.
  • TICK INTERVAL: The minimum interval (as an interval) to produce batches of data (within each partition) after snapshotting.
  • SEED: A per-source uint8 seed for seeding the random data.
  • PARTITIONS: The number of partitions to spread the keys across. Can be used to scale concurrency independent of the replica size.
  • BATCH SIZE: The number of keys per partition to produce in each update (based on TICK INTERVAL).

Monitoring source progress

By default, load generator sources expose progress metadata as a subsource that you can use to monitor source ingestion progress. The name of the progress subsource can be specified when creating a source using the EXPOSE PROGRESS AS clause; otherwise, it will be named <src_name>_progress.

The following metadata is available for each source as a progress subsource:

Field Type Meaning
offset uint8 The minimum offset for which updates to this sources are still undetermined.

And can be queried using:

SELECT "offset"
FROM <src_name>_progress;

As long as the offset continues increasing, Materialize is generating data. For more details on monitoring source ingestion progress and debugging related issues, see Troubleshooting.

Examples

Creating a counter load generator

To create a load generator source that emits the next number in the sequence every 500 milliseconds:

CREATE SOURCE counter
  FROM LOAD GENERATOR COUNTER
  (TICK INTERVAL '500ms');

To examine the counter:

SELECT * FROM counter;
 counter
---------
       1
       2
       3

Creating an auction load generator

To create a load generator source that simulates an auction house and emits new data every second:

CREATE SOURCE auction_house
  FROM LOAD GENERATOR AUCTION
  (TICK INTERVAL '1s')
  FOR ALL TABLES;

To display the created subsources:

SHOW SOURCES;
          name          |      type
------------------------+----------------
 accounts               | subsource
 auction_house          | load-generator
 auction_house_progress | progress
 auctions               | subsource
 bids                   | subsource
 organizations          | subsource
 users                  | subsource

To examine the simulated bids:

SELECT * from bids;
 id | buyer | auction_id | amount |          bid_time
----+-------+------------+--------+----------------------------
 10 |  3844 |          1 |     59 | 2022-09-16 23:24:07.332+00
 11 |  1861 |          1 |     40 | 2022-09-16 23:24:08.332+00
 12 |  3338 |          1 |     97 | 2022-09-16 23:24:09.332+00

Creating a clock load generator

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

To enable this feature in your Materialize region, contact our team.

To create a load generator source that ticks over to a new time every second:

CREATE SOURCE clock
  FROM LOAD GENERATOR CLOCK
  (TICK INTERVAL '1s');

To display the created source:

SHOW SOURCES;
      name      |      type      | size |  cluster
----------------+----------------+------+-----------
 clock          | load-generator | 1    | mz_system
 clock_progress | progress       |      |

To check the current clock time:

SELECT * FROM clock;
          time
------------------------
 2024-07-02 16:25:06+00

Creating a marketing load generator

To create a load generator source that simulates an online marketing campaign:

CREATE SOURCE marketing
  FROM LOAD GENERATOR MARKETING
  FOR ALL TABLES;

To display the created subsources:

SHOW SOURCES;
          name          |      type
------------------------+---------------
 clicks                 | subsource
 conversion_predictions | subsource
 coupons                | subsource
 customers              | subsource
 impressions            | subsource
 leads                  | subsource
 marketing              | load-generator
 marketing_progress     | progress

To find all impressions and clicks associated with a campaign over the last 30 days:

WITH
    click_rollup AS
    (
        SELECT impression_id AS id, count(*) AS clicks
        FROM clicks
        WHERE click_time - INTERVAL '30' DAY <= mz_now()
        GROUP BY impression_id
    ),
    impression_rollup AS
    (
        SELECT id, campaign_id, count(*) AS impressions
        FROM impressions
        WHERE impression_time - INTERVAL '30' DAY <= mz_now()
        GROUP BY id, campaign_id
    )
SELECT campaign_id, sum(impressions) AS impressions, sum(clicks) AS clicks
FROM impression_rollup LEFT JOIN click_rollup USING(id)
GROUP BY campaign_id;
 campaign_id | impressions | clicks
-------------+-------------+--------
           0 |         350 |     33
           1 |         325 |     28
           2 |         319 |     24
           3 |         315 |     38
           4 |         305 |     28
           5 |         354 |     31
           6 |         346 |     25
           7 |         337 |     36
           8 |         329 |     38
           9 |         305 |     24
          10 |         345 |     27
          11 |         323 |     30
          12 |         320 |     29
          13 |         331 |     27
          14 |         310 |     22
          15 |         324 |     28
          16 |         315 |     32
          17 |         329 |     36
          18 |         329 |     28

Creating a TPCH load generator

To create the load generator source and its associated subsources:

CREATE SOURCE tpch
  FROM LOAD GENERATOR TPCH (SCALE FACTOR 1)
  FOR ALL TABLES;

To display the created subsources:

SHOW SOURCES;
      name     |      type
---------------+---------------
 tpch          | load-generator
 tpch_progress | progress
 supplier      | subsource
 region        | subsource
 partsupp      | subsource
 part          | subsource
 orders        | subsource
 nation        | subsource
 lineitem      | subsource
 customer      | subsource

To run the Pricing Summary Report Query (Q1), which reports the amount of billed, shipped, and returned items:

SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) AS sum_qty,
    sum(l_extendedprice) AS sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    avg(l_quantity) AS avg_qty,
    avg(l_extendedprice) AS avg_price,
    avg(l_discount) AS avg_disc,
    count(*) AS count_order
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;
 l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price  |    sum_charge     |      avg_qty       |     avg_price      |      avg_disc       | count_order
--------------+--------------+----------+----------------+-----------------+-------------------+--------------------+--------------------+---------------------+-------------
 A            | F            | 37772997 |    56604341792 |  54338346989.17 |  57053313118.2657 | 25.490380624798817 | 38198.351517998075 | 0.04003729114831228 |     1481853
 N            | F            |   986796 |     1477585066 |   1418531782.89 |   1489171757.0798 | 25.463731840115603 |  38128.27564317601 | 0.04007431682708436 |       38753
 N            | O            | 74281600 |   111337230039 | 106883023012.04 | 112227399730.9018 |  25.49430183051871 | 38212.221432873834 | 0.03999775539657235 |     2913655
 R            | F            | 37770949 |    56610551077 |   54347734573.7 |  57066196254.4557 | 25.496431466814634 |  38213.68205054471 | 0.03997848687172654 |     1481421
Back to top ↑