Amazon Managed Streaming for Apache Kafka (Amazon MSK)

This guide goes through the required steps to connect Materialize to an Amazon MSK cluster.

💡 Tip: For help getting started with your own data, you can schedule a free guided trial.

Before you begin

Before you begin, you must have:

An Amazon MSK cluster running on AWS.
A client machine that can interact with your cluster.

Creating a connection

There are various ways to configure your Kafka network to allow Materialize to connect:

Allow Materialize IPs: If your Kafka cluster is publicly accessible, you can configure your firewall to allow connections from a set of static Materialize IP addresses.
Use AWS PrivateLink: If your Kafka cluster is running in a private network, you can use AWS PrivateLink to connect Materialize to the cluster. For details, see AWS PrivateLink.
Use an SSH tunnel: If your Kafka cluster is running in a private network, you can use an SSH tunnel to connect Materialize to the cluster.

NOTE: Materialize provides a Terraform module that automates the creation and configuration of AWS resources for a PrivateLink connection. For more details, see the Terraform module repositories for Amazon MSK and self-managed Kafka clusters.

This section covers how to create AWS PrivateLink connections and retrieve the AWS principal needed to configure the AWS PrivateLink service.

Create target groups

Create a dedicated target group for each broker with the following details:

a. Target type as IP address.

b. Protocol as TCP.

c. Port as 9092, or the port that you are using in case it is not 9092 (e.g. 9094 for TLS or 9096 for SASL).

d. Make sure that the target group is in the same VPC as the Kafka cluster.

e. Click next, and register the respective Kafka broker to each target group using its IP address.
Create a Network Load Balancer (NLB)

Create a Network Load Balancer that is enabled for the same subnets that the Kafka brokers are in.
Create TCP listeners

Create a TCP listener for every Kafka broker that forwards to the corresponding target group you created (e.g. b-1, b-2, b-3).

The listener port needs to be unique, and will be used later on in the CREATE CONNECTION statement.

For example, you can create a listener for:

a. Port 9001 → broker b-1....

b. Port 9002 → broker b-2....

c. Port 9003 → broker b-3....
Verify security groups and health checks

Once the TCP listeners have been created, make sure that the health checks for each target group are passing and that the targets are reported as healthy.

If you have set up a security group for your Kafka cluster, you must ensure that it allows traffic on both the listener port and the health check port.

Remarks:

a. Network Load Balancers do not have associated security groups. Therefore, the security groups for your targets must use IP addresses to allow traffic.

b. You can’t use the security groups for the clients as a source in the security groups for the targets. Therefore, the security groups for your targets must use the IP addresses of the clients to allow traffic. For more details, check the AWS documentation.
Create a VPC endpoint service

Create a VPC endpoint service and associate it with the Network Load Balancer that you’ve just created.

Note the service name that is generated for the endpoint service.
Create an AWS PrivateLink connection

In Materialize, create an AWS PrivateLink connection that references the endpoint service that you created in the previous step.

↕️ In-region connections

To connect to an AWS PrivateLink endpoint service in the same region as your Materialize environment:
```
CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK (
  SERVICE NAME 'com.amazonaws.vpce.<region_id>.vpce-svc-<endpoint_service_id>',
  AVAILABILITY ZONES ('use1-az1', 'use1-az2', 'use1-az4')
);
```
- Replace the SERVICE NAME value with the service name you noted earlier.
- Replace the AVAILABILITY ZONES list with the IDs of the availability zones in your AWS account. For in-region connections the availability zones of the NLB and the consumer VPC must match.
  
  To find your availability zone IDs, select your database in the RDS Console and click the subnets under Connectivity & security. For each subnet, look for Availability Zone ID (e.g., use1-az6), not Availability Zone (e.g., us-east-1d).
↔️ Cross-region connections

To connect to an AWS PrivateLink endpoint service in a different region to the one where your Materialize environment is deployed:
```
CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK (
  SERVICE NAME 'com.amazonaws.vpce.us-west-1.vpce-svc-<endpoint_service_id>',
  -- For now, the AVAILABILITY ZONES clause **is** required, but will be
  -- made optional in a future release.
  AVAILABILITY ZONES ()
);
```
- Replace the SERVICE NAME value with the service name you noted earlier.
- The service name region refers to where the endpoint service was created. You do not need to specify AVAILABILITY ZONES manually — these will be optimally auto-assigned when none are provided.
- For Kafka connections, it is required for cross-zone load balancing to be enabled on the VPC endpoint service’s NLB when using cross-region Privatelink.

Configure the AWS PrivateLink service

Retrieve the AWS principal for the AWS PrivateLink connection you just created:

SELECT principal
FROM mz_aws_privatelink_connections plc
JOIN mz_connections c ON plc.id = c.id
WHERE c.name = 'privatelink_svc';

                                 principal
---------------------------------------------------------------------------
 arn:aws:iam::664411391173:role/mz_20273b7c-2bbe-42b8-8c36-8cc179e9bbc3_u1

Follow the instructions in the AWS PrivateLink documentation to configure your VPC endpoint service to accept connections from the provided AWS principal.

If your AWS PrivateLink service is configured to require acceptance of connection requests, you must manually approve the connection request from Materialize after executing CREATE CONNECTION. For more details, check the AWS PrivateLink documentation.

Note: It might take some time for the endpoint service connection to show up, so you would need to wait for the endpoint service connection to be ready before you create a source.

Validate the AWS PrivateLink connection

Validate the AWS PrivateLink connection you created using the VALIDATE CONNECTION command:

VALIDATE CONNECTION privatelink_svc;

If no validation error is returned, move to the next step.

Create a source connection

In Materialize, create a source connection that uses the AWS PrivateLink connection you just configured:

CREATE CONNECTION kafka_connection TO KAFKA (
    BROKERS (
        -- The port **must exactly match** the port assigned to the broker in
        -- the TCP listerner of the NLB.
        'b-1.hostname-1:9096' USING AWS PRIVATELINK privatelink_svc (PORT 9001, AVAILABILITY ZONE 'use1-az2'),
        'b-2.hostname-2:9096' USING AWS PRIVATELINK privatelink_svc (PORT 9002, AVAILABILITY ZONE 'use1-az1'),
        'b-3.hostname-3:9096' USING AWS PRIVATELINK privatelink_svc (PORT 9003, AVAILABILITY ZONE 'use1-az4')
    ),
    -- Authentication details
    -- Depending on the authentication method the Kafka cluster is using
    SASL MECHANISMS = 'SCRAM-SHA-512',
    SASL USERNAME = 'foo',
    SASL PASSWORD = SECRET kafka_password
);

Troubleshooting

If you run into connectivity issues during source creation, make sure that:

The (PORT <port_number>) value exactly matches the port assigned to the corresponding broker in the TCP listener of the Network Load Balancer. Misalignment between ports and broker addresses is the most common cause for connectivity issues.
For in-region connections, the correct availability zone is specified for each broker.

Materialize can connect to a Kafka broker, a Confluent Schema Registry server, a PostgreSQL database, or a MySQL database through an SSH tunnel connection. In this guide, you will create an SSH tunnel connection, configure your Materialize authentication settings, and create a source connection.

Before you begin

Before you begin, make sure you have access to a bastion host. You will need:

The bastion host IP address and port number
The bastion host username

Create an SSH tunnel connection

In Materialize, create an SSH tunnel connection to the bastion host:

CREATE CONNECTION ssh_connection TO SSH TUNNEL (
    HOST '<SSH_BASTION_HOST>',
    USER '<SSH_BASTION_USER>',
    PORT <SSH_BASTION_PORT>
);

Configure the SSH bastion host

The bastion host needs a public key to connect to the Materialize tunnel you created in the previous step.

Materialize stores public keys for SSH tunnels in the system catalog. Query mz_ssh_tunnel_connections to retrieve the public keys for the SSH tunnel connection you just created:

SELECT
    mz_connections.name,
    mz_ssh_tunnel_connections.*
FROM
    mz_connections JOIN
    mz_ssh_tunnel_connections USING(id)
WHERE
    mz_connections.name = 'ssh_connection';

| id    | public_key_1                          | public_key_2                          |
|-------|---------------------------------------|---------------------------------------|
| u75   | ssh-ed25519 AAAA...76RH materialize   | ssh-ed25519 AAAA...hLYV materialize   |

Materialize provides two public keys to allow you to rotate keys without connection downtime. Review the ALTER CONNECTION documentation for more information on how to rotate your keys.

# Command for Linux
echo "ssh-ed25519 AAAA...76RH materialize" >> ~/.ssh/authorized_keys
echo "ssh-ed25519 AAAA...hLYV materialize" >> ~/.ssh/authorized_keys

Configure your internal firewall to allow the SSH bastion host to connect to your Kafka cluster or PostgreSQL instance.

If you are using a cloud provider like AWS or GCP, update the security group or firewall rules for your PostgreSQL instance or Kafka brokers.

Allow incoming traffic from the SSH bastion host IP address on the necessary ports.

For example, use port 5432 for PostgreSQL and ports 9092, 9094, and 9096 for Kafka.

Test the connection from the bastion host to the Kafka cluster or PostgreSQL instance.
```
telnet <KAFKA_BROKER_HOST> <KAFKA_BROKER_PORT>
telnet <POSTGRES_HOST> <POSTGRES_PORT>
```
If the command hangs, double-check your security group and firewall settings. If the connection is successful, you can proceed to the next step.
Verify the SSH tunnel connection from your source to your bastion host:
```
# Command for Linux
ssh -L 9092:kafka-broker:9092 <SSH_BASTION_USER>@<SSH_BASTION_HOST>
```
Verify that you can connect to the Kafka broker or PostgreSQL instance via the SSH tunnel:
```
telnet localhost 9092
```
If you are unable to connect using the telnet command, enable AllowTcpForwarding and PermitTunnel on your bastion host SSH configuration file.

On your SSH bastion host, open the SSH config file (usually located at /etc/ssh/sshd_config) using a text editor:
```
sudo nano /etc/ssh/sshd_config
```
Add or uncomment the following lines:
```
AllowTcpForwarding yes
PermitTunnel yes
```
Save the changes and restart the SSH service:
```
sudo systemctl restart sshd
```
Retrieve the static egress IPs from Materialize and configure the firewall rules (e.g. AWS Security Groups) for your bastion host to allow SSH traffic for those IP addresses only.
```
SELECT * FROM mz_catalog.mz_egress_ips;
```
```
XXX.140.90.33
XXX.198.159.213
XXX.100.27.23
```

Validate the SSH tunnel connection

To confirm that the SSH tunnel connection is correctly configured, use the VALIDATE CONNECTION command:

VALIDATE CONNECTION ssh_connection;

If no validation errors are returned, the connection can be used to create a source connection.

Create a source connection

In Materialize, create a source connection that uses the SSH tunnel connection you configured in the previous section:

CREATE CONNECTION kafka_connection TO KAFKA (
  BROKER 'broker1:9092',
  SSH TUNNEL ssh_connection
);

This section goes through the required steps to connect Materialize to an Amazon MSK cluster, including some of the more complicated bits around configuring security settings in Amazon MSK.

If you already have an Amazon MSK cluster, you can skip step 1 and directly move on to Make the cluster public and enable SASL. You can also skip steps 3 and 4 if you already have Apache Kafka installed and running, and have created a topic that you want to create a source for.

The process to connect Materialize to Amazon MSK consists of the following steps:

Create an Amazon MSK cluster

If you already have an Amazon MSK cluster set up, then you can skip this step.

a. Sign in to the AWS Management Console and open the Amazon MSK console

b. Choose Create cluster

c. Enter a cluster name, and leave all other settings unchanged

d. From the table under All cluster settings, copy the values of the following settings and save them because you need them later in this tutorial: VPC, Subnets, Security groups associated with VPC

e. Choose Create cluster

Note: This creation can take about 15 minutes.
Make the cluster public and enable SASL

Turn on SASL

a. Navigate to the Amazon MSK console

b. Choose the MSK cluster you just created in Step 1

c. Click on the Properties tab

d. In the Security settings section, choose Edit

e. Check the checkbox next to SASL/SCRAM authentication

f. Click Save changes

You can find more details about updating a cluster’s security configurations here.

Create a symmetric key

a. Now go to the AWS Key Management Service (AWS KMS) console

b. Click Create Key

c. Choose Symmetric and click Next

d. Give the key and Alias and click Next

e. Under Administrative permissions, check the checkbox next to the AWSServiceRoleForKafka and click Next

f. Under Key usage permissions, again check the checkbox next to the AWSServiceRoleForKafka and click Next

g. Click on Create secret

h. Review the details and click Finish

You can find more details about creating a symmetric key here.

Store a new Secret

a. Go to the AWS Secrets Manager console

b. Click Store a new secret

c. Choose Other type of secret (e.g. API key) for the secret type

d. Under Key/value pairs click on Plaintext

e. Paste the following in the space below it and replace <your-username> and <your-password> with the username and password you want to set for the cluster
```
  {
    "username": "<your-username>",
    "password": "<your-password>"
  }
```
f. On the next page, give a Secret name that starts with AmazonMSK_

g. Under Encryption Key, select the symmetric key you just created in the previous sub-section from the dropdown

h. Go forward to the next steps and finish creating the secret. Once created, record the ARN (Amazon Resource Name) value for your secret

You can find more details about creating a secret using AWS Secrets Manager here.

Associate secret with MSK cluster

a. Navigate back to the Amazon MSK console and click on the cluster you created in Step 1

b. Click on the Properties tab

c. In the Security settings section, under SASL/SCRAM authentication, click on Associate secrets

d. Paste the ARN you recorded in the previous subsection and click Associate secrets

Create the cluster’s configuration

a. Go to the Amazon CloudShell console

b. Create a file (eg. msk-config.txt) with the following line
```
  allow.everyone.if.no.acl.found = false
```
c. Run the following AWS CLI command, replacing <config-file-path> with the path to the file where you saved your configuration in the previous step
```
  aws kafka create-configuration --name "MakePublic" \
  --description "Set allow.everyone.if.no.acl.found = false" \
  --kafka-versions "2.6.2" \
  --server-properties fileb://<config-file-path>/msk-config.txt
```
You can find more information about making your cluster public here.
Create a client machine

If you already have a client machine set up that can interact with your cluster, then you can skip this step.

If not, you can create an EC2 client machine and then add the security group of the client to the inbound rules of the cluster’s security group from the VPC console. You can find more details about how to do that here.
Install Apache Kafka and create a topic

To start using Materialize with Apache Kafka, you need to create a Materialize source over an Apache Kafka topic. If you already have Apache Kafka installed and a topic created, you can skip this step.

Otherwise, you can install Apache Kafka on your client machine from the previous step and create a topic. You can find more information about how to do that here.
Create ACLs

As allow.everyone.if.no.acl.found is set to false, you must create ACLs for the cluster and topics configured in the previous step to set appropriate access permissions. For more information, see the Amazon MSK documentation.
Create a source in Materialize

a. Open the Amazon MSK console and select your cluster

b. Click on View client information

c. Copy the url under Private endpoint and against SASL/SCRAM. This will be your <broker-url> going forward.

d. Connect to Materialize using the SQL Shell, or your preferred SQL client.

e. Create a connection using the command below. The broker URL is what you copied in step c of this subsection. The <topic-name> is the name of the topic you created in Step 4. The <your-username> and <your-password> is from Store a new secret under Step 2.
```
CREATE SECRET msk_password AS '<your-password>';

CREATE CONNECTION kafka_connection TO KAFKA (
    BROKER '<broker-url>',
    SASL MECHANISMS = 'SCRAM-SHA-512',
    SASL USERNAME = '<your-username>',
    SASL PASSWORD = SECRET msk_password
  );
```
f. If the command executes without an error and outputs CREATE SOURCE, it means that you have successfully connected Materialize to your cluster.

Note: The example above walked through creating a source which is a way of connecting Materialize to an external data source. We created a connection to Amazon MSK using SASL authentication, using credentials securely stored as secrets in Materialize’s secret management system. For input formats, we used text, however, Materialize supports various other options as well. For example, you can ingest messages formatted in JSON, Avro and Protobuf. You can find more details about the various different supported formats and possible configurations here.

Creating a source

The Kafka connection created in the previous section can then be reused across multiple CREATE SOURCE statements. By default, the source will be created in the active cluster; to use a different cluster, use the IN CLUSTER clause.

CREATE SOURCE json_source
  FROM KAFKA CONNECTION kafka_connection (TOPIC 'test_topic')
  FORMAT JSON;

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Before you begin

Creating a connection

Create target groups

Create a Network Load Balancer (NLB)

Create TCP listeners

Verify security groups and health checks

Create a VPC endpoint service

Create an AWS PrivateLink connection

Configure the AWS PrivateLink service

Validate the AWS PrivateLink connection

Create a source connection

Troubleshooting

Before you begin

Create an SSH tunnel connection

Configure the SSH bastion host

Validate the SSH tunnel connection

Create a source connection

Create an Amazon MSK cluster

Make the cluster public and enable SASL

Turn on SASL

Create a symmetric key

Store a new Secret

Associate secret with MSK cluster

Create the cluster’s configuration

Create a client machine

Install Apache Kafka and create a topic

Create ACLs

Create a source in Materialize

Creating a source

Related pages