Changelog

Unicode string normalization

Oct 28, 2025

Materialize now supports the normalize() function for Unicode string normalization, matching PostgreSQL’s implementation. This function transforms Unicode text into a standard form, which is essential for consistent string comparison and processing of international text.

Syntax

sql
-- Default normalization (NFC)
SELECT normalize('café');

-- Explicit normalization form
SELECT normalize('café', NFC);   -- Canonical composition (default)
SELECT normalize('café', NFD);   -- Canonical decomposition
SELECT normalize('file', NFKC);  -- Compatibility composition
SELECT normalize('file', NFKD);  -- Compatibility decomposition

For more details on Unicode normalization forms, see the Unicode Standard documentation.

Use cases

  • Text comparison: Ensure strings with different Unicode representations of the same characters compare as equal
  • Data cleaning: Standardize text input from various sources that may use different Unicode encodings
  • Compatibility: Convert special characters like ligatures (fi → fi) to their compatible forms

This feature is particularly useful when working with international data where the same visual character can have multiple Unicode representations.

For more information, see our official documentation.

Get Started with Materialize