Encoding Explained: UTF-8, ASCII, Base64, and URL Encoding
Understand character encodings, binary-to-text encoding, and URL encoding to prevent data corruption and bugs.
Hash Generator
Generate SHA-1, SHA-256, SHA-384, SHA-512 hashes from text
Text and Data Encoding
Encoding confusion causes garbled text, broken URLs, corrupted data, and security vulnerabilities. Understanding the purpose of each encoding type prevents these issues.
Character Encoding: ASCII and UTF-8
ASCII maps 128 characters (English letters, digits, punctuation) to numbers 0-127, using 7 bits per character. UTF-8 extends this to support every Unicode character (148,000+) using 1-4 bytes. ASCII text is valid UTF-8. The reverse is not true — UTF-8 text containing non-ASCII characters is not valid ASCII. Always use UTF-8 for new projects.
UTF-8 vs UTF-16 vs UTF-32
UTF-8 uses variable-width encoding (1-4 bytes): efficient for ASCII-heavy text (English), less so for CJK characters (3 bytes each). UTF-16 uses 2 or 4 bytes: efficient for CJK text, wasteful for ASCII. UTF-32 uses exactly 4 bytes per character: simplest to process but wasteful of space. Web standard: UTF-8. Windows internals: UTF-16. Database analysis: UTF-32.
Base64 Encoding
Base64 converts binary data to ASCII text using 64 characters (A-Z, a-z, 0-9, +, /). It's used to embed binary data in text-only contexts: email attachments (MIME), data URIs in HTML/CSS, and JWT payloads. Base64 increases data size by approximately 33%. Base64url variant replaces + with - and / with _ for URL safety.
URL Encoding (Percent Encoding)
Special characters in URLs are encoded as %XX where XX is the hex value: space becomes %20, & becomes %26. This prevents special characters from being interpreted as URL syntax. Over-encoding (encoding characters that don't need it) is harmless but makes URLs ugly. Under-encoding causes parsing errors and potential security issues.
Common Encoding Bugs
Mojibake (garbled text) means the encoding was misidentified — UTF-8 bytes interpreted as Latin-1, or vice versa. Double encoding (%2520 instead of %20) means the data was URL-encoded twice. Base64 "padding" errors (invalid length, missing = signs) indicate the encoded data was truncated during transmission.
เครื่องมือที่เกี่ยวข้อง
รูปแบบที่เกี่ยวข้อง
คู่มือที่เกี่ยวข้อง
JSON vs YAML vs TOML: Choosing a Configuration Format
Configuration files are the backbone of modern applications. JSON, YAML, and TOML each offer different trade-offs between readability, complexity, and tooling support that affect your development workflow.
How to Format and Validate JSON Data
Malformed JSON causes silent failures in APIs and configuration files. Learn how to format, validate, and debug JSON documents to prevent integration errors and improve readability.
Base64 Encoding: How It Works and When to Use It
Base64 converts binary data into ASCII text, making it safe for transmission through text-based systems. Learn when Base64 is the right choice and when alternatives like hex encoding or URL encoding are more appropriate.
Best Practices for Working with Unix Timestamps
Unix timestamps provide a language-agnostic way to represent points in time, but they come with pitfalls around time zones, precision, and the 2038 problem. This guide covers best practices for storing and converting timestamps.
Troubleshooting JWT Token Issues
JSON Web Tokens are widely used for authentication but can be frustrating to debug. This guide covers common JWT problems including expiration errors, signature mismatches, and payload decoding issues.