Unicode Text Converter

Convert text between Unicode encodings like UTF-8, UTF-16, and HTML entities. Easily encode or decode text for web development and data processing.

Unicode Text Converter

Convert plain text to various Unicode formats and vice versa

Unicode Input

Format Examples

UTF-8 Hex

48 65 6C 6C 6F

UTF-16

U+0048 U+0065 U+006C U+006C U+006F

HTML Entities

Hello

Escape Sequence

\u0048\u0065\u006C\u006C\u006F

How the Unicode Text Converter Works

Enter your text and select the encoding format you need. The converter transforms your text into UTF-8 hex bytes, UTF-16 code units, UTF-32 code points, HTML entities, or escape sequences. Click Convert to see the result.

Each encoding serves different purposes. UTF-8 is used for file storage and web transmission. UTF-16 is common in Windows and JavaScript. UTF-32 gives direct code point access. HTML entities escape special characters for web pages. Escape sequences are used in programming languages.

The converter also works in reverse. Paste encoded text and it will decode back to readable characters. This is useful for debugging encoding issues or understanding what encoded strings actually contain.

When You'd Actually Use This

Debugging character encoding issues

Your API returns garbled text. Convert it to see the actual bytes. "M" showing as "MÃ¼" becomes clear when you see the UTF-8 bytes vs what was interpreted as Latin-1.

Writing HTML with special characters

Need to display code examples with angle brackets on a webpage? Convert to HTML entities so < and > show as text instead of being interpreted as HTML tags.

Creating string literals in code

Your source file encoding doesn't support certain characters. Use escape sequences like \u00E9 for "e" to include Unicode in ASCII-only source files.

Analyzing binary file formats

Reverse engineering a file format? Convert text to UTF-8 hex to compare against the binary data. Match byte patterns to understand the file structure.

Preparing data for legacy systems

Old databases may only accept ASCII. Convert Unicode text to escape sequences or entities that the system can store, then decode when reading back.

Learning about Unicode encodings

Students can see how the same text looks in different encodings. Compare UTF-8's variable-length bytes to UTF-16's fixed 2-byte units for understanding.

What to Know Before Using

UTF-8 uses variable-length encoding.ASCII characters are 1 byte, but emoji can be 4 bytes. "A" is 41 in hex, but "" is F0 9F 98 80. This is why UTF-8 is efficient for mostly-ASCII text.

UTF-16 may use surrogate pairs.Characters above U+FFFF (like emoji) need two 16-bit code units in UTF-16. JavaScript's charCodeAt() returns these separately, not the full code point.

HTML entities have named alternatives.While this tool generates numeric entities like é, HTML also supports named entities like é. Both produce the same character.

BOM markers may appear in files.UTF-8 files sometimes start with EF BB BF (the BOM). This is optional and often unnecessary. Most modern systems handle UTF-8 without BOM correctly.

Pro tip: When debugging encoding issues, always check what encoding the source and destination expect. Mismatched encodings cause most "garbled text" problems, not actual data corruption.

Common Questions

What's the difference between UTF-8 and UTF-16?

UTF-8 uses 1-4 bytes per character and is backward compatible with ASCII. UTF-16 uses 2 or 4 bytes and is more efficient for Asian languages. UTF-8 dominates on the web.

Why does emoji take 4 bytes in UTF-8?

Emoji have code points above U+FFFF. In UTF-8, anything above U+FFFF requires 4 bytes. The "" emoji is U+1F600, which encodes as F0 9F 98 80.

Can I convert entire files?

This tool works on text you paste in. For file conversion, use dedicated tools or programming libraries. Python's codecs module handles file encoding conversion well.

What are HTML entities used for?

HTML entities escape characters that have special meaning in HTML. & becomes &amp;, < becomes <. This prevents them from being interpreted as markup.

Is UTF-32 ever used in practice?

Rarely. It's simple (one 32-bit value per code point) but wasteful of space. Some internal systems use it for easy indexing, but storage and transmission favor UTF-8.

How do I know what encoding a file uses?

There's no reliable way without metadata. UTF-8 with BOM has a signature. Otherwise, you need to know from the file's source or try detection heuristics.

Why do some characters show as question marks?

The target encoding doesn't support that character. ASCII can't represent "". When converting, unsupported characters often become ? or replacement character .

Other Free Tools

Search tools

Unicode Text Converter