TFT

Unicode Normalizer

Normalize Unicode text to ensure consistency. Convert between NFC, NFD, NFKC, and NFKD forms for reliable text comparison and storage.

Unicode Normalizer

Normalize Unicode text to any of the four standard normalization forms

Normalization Forms

NFC (Canonical Decomposition, followed by Canonical Composition)
Most common form. Combines characters where possible. Used by macOS filesystem.
é (U+00E9) stays as é
NFD (Canonical Decomposition)
Decomposes characters. Used for searching and comparison.
é becomes e + ́ (U+0065 U+0301)
NFKC (Compatibility Decomposition, followed by Canonical Composition)
Also normalizes compatibility characters. Recommended for identifiers.
fi becomes fi, ² becomes 2
NFKD (Compatibility Decomposition)
Full decomposition including compatibility characters.
fi becomes f + i, ² becomes 2

How it works

This tool normalizes Unicode text to standard forms (NFC, NFD, NFKC, NFKD). Unicode allows multiple ways to represent the same character—either as a single code point or as a base character plus combining marks.

Normalization converts text to a consistent representation. NFC composes characters (single code point where possible). NFD decomposes them (base + combining marks). NFKC and NFKD also apply compatibility mappings.

Normalization forms:

NFCComposed form (default for most uses)
NFDDecomposed form (useful for searching)
NFKC/DCompatibility forms (normalizes ligatures, etc.)

Paste text and select a normalization form. The tool shows the before and after code points so you can see exactly what changed.

When you'd actually use this

Fixing string comparison failures

A developer's code says "é" doesn't equal "é". One is U+00E9 (composed), the other is e + U+0301 (decomposed). Normalizing both to NFC makes them match.

Cleaning user input for storage

A database stores user names inconsistently—some with composed accents, some decomposed. Normalizing all input to NFC ensures consistent storage and reliable lookups.

Implementing search functionality

A search feature should find "café" whether users type composed or decomposed. The developer normalizes both the index and queries to NFD for consistent matching.

Processing text from multiple sources

A data pipeline ingests text from various systems—some use NFC, some NFD. Normalizing everything to one form prevents downstream comparison and sorting issues.

Handling compatibility characters

Text contains ligatures like "fi" (U+FB01) that should match "fi". NFKC normalization converts compatibility characters to their canonical equivalents for consistent processing.

Debugging Unicode edge cases

A QA engineer investigates why two visually identical strings compare differently. They normalize and compare code points to find the hidden difference.

What to know before using it

NFC is the recommended default.W3C and Unicode recommend NFC for web content. It's the most compact form and what most systems expect. Use NFC unless you have a specific reason for another form.

Normalization can change string length.NFD expands characters—é (1 code point) becomes e + combining accent (2 code points). This affects length calculations and buffer allocations.

NFKC/D loses information.Compatibility normalization converts "fi" to "fi" and superscript ² to regular 2. This is irreversible. Only use NFKC/D when you want to lose compatibility distinctions.

Some strings can't be normalized to match.Different characters that look similar (homoglyphs) won't normalize to the same form. Latin 'A' and Cyrillic 'А' remain different after normalization.

Pro tip: Always normalize before comparing strings for equality. But normalize both strings the same way. Comparing NFC to NFD will still fail even though they represent the same text.

Common questions

What's the difference between NFC and NFD?

NFC composes characters where possible (é as U+00E9). NFD decomposes them (e + U+0301 combining acute). Both display identically but have different code points.

When should I use NFKC or NFKD?

Use NFKC/D when you want to normalize compatibility variants— ligatures, full-width characters, superscripts. But be aware it's lossy. Don't use it for text that needs to preserve exact formatting.

Does normalization affect emoji?

Most emoji aren't affected. However, emoji with skin tone modifiers or ZWJ sequences may be normalized. Flag emoji (regional indicator pairs) stay as two code points.

How do I normalize in code?

JavaScript: str.normalize('NFC'). Python: unicodedata.normalize('NFC', str). Java: Normalizer.normalize(str, NFC). Most languages have built-in normalization support.

Can normalization break text?

NFC and NFD are reversible and won't break text. NFKC/D can change meaning by converting compatibility characters. Use NFKC/D carefully and only when you understand the implications.

Why do I need Unicode normalization?

Without normalization, visually identical text can compare as different. This causes bugs in search, sorting, and data deduplication. Normalization ensures consistent representation.

Is normalization slow?

Modern normalization is fast. For most applications, the overhead is negligible. It's worth the cost to avoid Unicode comparison bugs. Batch normalize on input rather than every comparison for best performance.