TFT

UTF-16 Encoder & Decoder - BE & LE

Convert text to UTF-16 encoding in either Big Endian or Little Endian byte order, or decode UTF-16 byte sequences to readable text. Includes BOM handling for proper encoding identification.

UTF-16 Encoder/Decoder

Encode text to UTF-16 hex or decode UTF-16 hex back to text

How the UTF-16 Encoder/Decoder Works

Enter your text to encode, or paste UTF-16 hex bytes to decode. Select the byte order: Big Endian (BE) or Little Endian (LE). The conversion happens instantly as you type.

UTF-16 represents each character as one or two 16-bit code units. Common characters use one unit. Rare characters outside the Basic Multilingual Plane use surrogate pairs - two 16-bit values.

BOM (Byte Order Mark) handling is automatic. The encoder can add a BOM to indicate byte order. The decoder detects and removes BOM automatically. Choose whether to include BOM in your output.

When You'd Actually Use This

Debugging Windows API calls

Windows uses UTF-16 internally for strings. When debugging API calls, you'll see UTF-16 byte sequences. Decode them to understand what strings are being passed.

Analyzing binary file formats

Many file formats store strings as UTF-16. Extract string data from binaries by decoding UTF-16 sequences. Identify file metadata and embedded text.

Working with Java strings

Java uses UTF-16 internally for String objects. When examining Java memory or serialized data, strings appear as UTF-16. Decode to read the actual text.

Processing Windows registry exports

Registry files store strings in UTF-16 LE. When parsing registry hives programmatically, decode UTF-16 LE to read key names and values.

Handling XML with BOM

XML files may start with UTF-16 BOM. Detect the BOM to determine encoding and byte order. Properly decode the XML content for parsing.

Forensic text extraction

Digital forensics often involves extracting text from raw data. Identify UTF-16 encoded strings in memory dumps or disk images. Decode for analysis.

What to Know Before Using

Byte order matters.Big Endian stores high byte first. Little Endian stores low byte first. Windows uses LE. Network protocols typically use BE. Mismatched order produces garbage.

BOM indicates byte order.BOM is FEFF for BE, FFFE for LE. It's optional but helpful. Some systems require BOM, others reject it. Know your target system's expectations.

Surrogate pairs represent rare characters.Characters above U+FFFF use two 16-bit values. Emoji and many CJK characters need surrogate pairs. Decoding handles these automatically.

UTF-16 isn't always 2 bytes per character.Common characters use 2 bytes. Rare characters use 4 bytes (surrogate pairs). Average is slightly over 2 bytes per character for mixed text.

Pro tip: When decoding unknown UTF-16, try both byte orders. One will produce readable text, the other will look like alternating null bytes. The readable one is correct.

Common Questions

What's the difference between UTF-16 BE and LE?

Byte order. BE stores high byte first (network order). LE stores low byte first (Intel/Windows order). Same data, different byte arrangement.

When should I include BOM?

Include BOM when the reader might not know the byte order. Omit BOM when byte order is specified elsewhere or for protocols that don't expect BOM.

How do I identify UTF-16 in a file?

Look for BOM at the start: FEFF (BE) or FFFE (LE). Without BOM, look for patterns - ASCII text in UTF-16 has alternating null bytes.

Why use UTF-16 over UTF-8?

UTF-16 provides fixed-width for common characters. Windows and Java use it internally. UTF-8 is more space-efficient for ASCII-heavy text.

Can UTF-16 represent all Unicode?

Yes, UTF-16 covers all Unicode code points. Characters above U+FFFF use surrogate pairs. All valid Unicode can be encoded in UTF-16.

What are surrogate pairs?

Two 16-bit values that together represent one character above U+FFFF. High surrogate (D800-DFFF) followed by low surrogate (DC00-DFFF).

Is UTF-16 compatible with ASCII?

Not directly. ASCII characters in UTF-16 have a null byte. "A" is 0041 in UTF-16 BE, not 41 as in ASCII. Conversion is needed.