TFT

Unicode Character Encoder and Decoder

Convert characters to Unicode code points (like U+0041 for 'A') or decode Unicode code points back to characters. This tool includes character names and supports various Unicode encodings.

Unicode Character Encoder/Decoder

Convert text to Unicode code points and vice versa

About Unicode

Unicode is a universal character encoding standard that assigns a unique number (code point) to every character, regardless of platform, program, or language. It supports over 149,000 characters from more than 160 scripts.

Code points are typically written as U+XXXX where XXXX is a hexadecimal number. The standard includes characters from virtually all writing systems in use today.

How the Unicode Character Encoder/Decoder Works

Enter text to encode Unicode characters to their code point representations, or paste encoded sequences to decode back to readable text. Choose your preferred output format.

Encoding options include: U+XXXX format, HTML entities (&#xXXXX;), CSS escapes (\XXXX), JavaScript escapes (\uXXXX), and Python escapes (\uXXXX or \UXXXXXXXX).

The decoder automatically detects the format and converts back to Unicode text. Handles both BMP characters (4 hex digits) and supplementary characters (6+ hex digits).

When You'd Actually Use This

Writing cross-platform code

Source code with Unicode may not display correctly everywhere. Encode special characters to escape sequences. Code works regardless of file encoding.

Creating CSS content

CSS uses \XXXX escapes for icons and special characters. Encode Unicode to CSS format for content properties. Works reliably across browsers.

Debugging encoding issues

Garbled text often indicates encoding problems. Encode to code points to see what characters are actually there. Identify the root cause.

Working with legacy systems

Old systems may not handle Unicode well. Encode to ASCII-safe escape sequences. Decode when displaying to users.

Documenting character usage

Technical documentation often needs code points. Reference characters by U+XXXX format. Precise and unambiguous identification.

Processing JSON with special chars

JSON supports \uXXXX escapes. Encode Unicode for safe JSON transmission. Ensures compatibility with all JSON parsers.

What to Know Before Using

Different formats for different contexts.U+XXXX for documentation. \uXXXX for JavaScript. &#xXXXX; for HTML. \XXXX for CSS. Choose the format matching your use case.

Surrogate pairs for emoji.Emoji and rare characters need two \\u escapes in some formats. JavaScript uses \\uD83D\\uDE00 for 😀. Newer syntax supports \\u{1F600}.

Case doesn't matter for hex.U+0041 and U+0041 are identical. Lowercase is common in code. Uppercase is traditional for documentation. Both decode the same.

Some characters need escaping.Backslash, quotes, and control characters always need escaping. Other Unicode is optional. Escaping ensures portability.

Pro tip: For JavaScript, prefer \u{XXXXX} syntax for characters above U+FFFF. It's cleaner than surrogate pairs. Requires ES6+ but much more readable.

Common Questions

What's U+ notation?

U+XXXX is the standard Unicode notation. U means Unicode. XXXX is the hex code point. U+0041 is Latin capital A. Universal standard format.

How do I encode emoji?

Emoji are supplementary characters. In JavaScript: \\u{1F600} or \\uD83D\\uDE00. In HTML: 😀 or 😀. In CSS: \\01F600.

Can I decode mixed formats?

Yes, the decoder handles mixed input. U+0041 and \u0041 and A all decode to 'A'. Automatically detects and processes each format.

What about combining characters?

Combining characters have their own code points. Encode and decode them separately. The sequence base + combining mark creates the visual character.

Why use escapes instead of raw Unicode?

Escapes work in any file encoding. Raw Unicode requires UTF-8 source files. Escapes are ASCII-safe and universally compatible.

How do I find a character's code point?

Encode the character to see its code point. Or use a character map tool. Many operating systems include character viewers showing code points.

What's the maximum code point?

Unicode goes up to U+10FFFF. That's 1,114,112 possible characters. Currently about 150,000 are assigned. Room for future expansion.