Materialize now supports the normalize() function for Unicode string normalization, matching PostgreSQL’s implementation. This function transforms Unicode text into a standard form, which is essential for consistent string comparison and processing of international text.
Syntax
-- Default normalization (NFC)
SELECT normalize('café');
-- Explicit normalization form
SELECT normalize('café', NFC); -- Canonical composition (default)
SELECT normalize('café', NFD); -- Canonical decomposition
SELECT normalize('file', NFKC); -- Compatibility composition
SELECT normalize('file', NFKD); -- Compatibility decompositionFor more details on Unicode normalization forms, see the Unicode Standard documentation.
Use cases
- Text comparison: Ensure strings with different Unicode representations of the same characters compare as equal
- Data cleaning: Standardize text input from various sources that may use different Unicode encodings
- Compatibility: Convert special characters like ligatures (fi → fi) to their compatible forms
This feature is particularly useful when working with international data where the same visual character can have multiple Unicode representations.
For more information, see our official documentation.