What is Base64?
On this page
Base64 is an encoding that represents arbitrary bytes using only 64
ASCII characters: A-Z, a-z, 0-9, +, and /, with = as
padding. The result is plain text that survives any system designed for
ASCII — HTTP headers, email, JSON strings, URL query parameters
(usually with the URL-safe variant).
It’s not encryption. It’s not compression — it actually adds 33% overhead. It’s just a way to put binary data into text-only contexts.
The 64 characters
The standard alphabet from RFC 4648:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
That’s 26 + 26 + 10 + 2 = 64. Each character represents 6 bits of input data, so 4 base64 characters encode 3 input bytes (24 bits).
How it works
Three input bytes (24 bits) split into four 6-bit groups, each mapped to a character:
Input bytes: 01001000 01101001 00100001 "Hi!" (3 bytes)
=72 =105 =33
Re-grouped: 010010 000110 100100 100001 (four 6-bit groups)
=18 =6 =36 =33
Lookup: S G k h
(alphabet position 33)
Output: "SGkh" (4 ASCII characters)
If the input length isn’t a multiple of 3, padding fills the gap:
| Input bytes | Output | Notes |
|---|---|---|
| 0 | "" | empty |
1 (H) | SA== | one byte → two output chars + two = |
2 (Hi) | SGk= | two bytes → three output chars + one = |
3 (Hi!) | SGkh | full block, no padding |
4 (Hi!H) | SGkhSA== | three bytes + one byte + padding |
The = characters aren’t part of the alphabet — they’re a structural
hint that says “the last group only had 1 or 2 bytes, not 3.”
Where it came from
Base64 originated in the early Internet (RFC 989, 1987) for emailing binary attachments through the strictly-7-bit MIME pipeline. Email servers couldn’t handle bytes above 127, so binary data had to be encoded as ASCII. That requirement is mostly historical now, but base64 stayed because:
- It’s well-defined and trivial to implement.
- The 33% overhead is small enough to not matter for most use cases.
- Almost every system speaks ASCII fluently.
The current spec is RFC 4648 (October 2006), which standardized both the original (“base64”) and URL-safe (“base64url”) variants.
Where it’s used today
Headers and credentials
- HTTP Basic Auth.
Authorization: Basic <base64(user:pass)>. - API tokens in some systems (though most modern APIs prefer JWT or opaque random strings).
Embedding binary in text
- Data URIs.
data:image/png;base64,...inlines an image directly in HTML or CSS. See the file converter. - Email MIME attachments. Every email with an attached image, PDF, or document uses base64.
- JSON payloads with binary data. JSON has no binary type, so protocols like S3, Slack file uploads, and Twilio messages carry binary content as base64-encoded strings.
Tokens
- JWT (JSON Web Tokens). All three parts of every JWT are base64url-encoded JSON. The signature is base64url-encoded raw bytes. See our JWT decoder.
- OAuth state, CSRF tokens, session IDs sometimes use base64 of random bytes.
Configuration
- Kubernetes secrets are stored as base64 in YAML. Note: this is encoding, not encryption — anyone with read access to the secret can decode it.
- Environment variables containing certificates, keys, or other binary data are often base64-encoded to fit on a single line.
What it is NOT
- Not encryption. Anyone can decode base64. Don’t use it to hide
data from a determined attacker — they will run
atob()and see your content. - Not compression. It makes data ~33% larger, not smaller. If you need both compression and ASCII-safe output, gzip first, then base64.
- Not a hash. Hashes are one-way; base64 roundtrips perfectly.
- Not a checksum. Errors during transmission won’t be caught — the receiver will just decode different bytes.
The 33% overhead in detail
For input of n bytes, output is ceil(n / 3) * 4 characters. Concrete
examples:
| Input size | Base64 output | Padding | Overhead |
|---|---|---|---|
| 1 byte | 4 chars | == | +300% |
| 2 bytes | 4 chars | = | +100% |
| 3 bytes | 4 chars | none | +33% |
| 1 KB | ~1.33 KB | varies | +33% |
| 1 MB image | ~1.33 MB string | varies | +33% |
Whenever bandwidth or storage matters, prefer binary transport. Use base64 only when binary isn’t an option (headers, JSON strings, email).
Common pitfalls
1. btoa doesn’t handle Unicode
In JavaScript:
btoa("café")
// → "InvalidCharacterError: failed to encode"
btoa only accepts characters where every code point is ≤ 255. To
handle arbitrary Unicode, encode to UTF-8 bytes first:
const bytes = new TextEncoder().encode("café");
const b64 = btoa(String.fromCharCode(...bytes));
// → "Y2Fmw6k="
The main encoder does this for you.
2. URL-safe vs standard
Standard base64 uses + and /, both reserved in URLs. Pasting
standard base64 into a URL parameter without encoding gives ambiguous
results. Use URL-safe base64 for any URL context.
3. Padding inconsistency
RFC 4648 says padding is required for standard base64 and optional for URL-safe. Some decoders are strict, some lenient. When you control both sides, pick a convention; when you don’t, the decoder accepts both.
4. Base64 isn’t a transport security boundary
A base64-encoded password in a YAML file or HTTP header is as exposed as plain text to anyone who can read it. Don’t conflate encoding with security.
In code
// JavaScript / Node
btoa("hello") // "aGVsbG8="
atob("aGVsbG8=") // "hello"
Buffer.from("hello").toString("base64") // Node-only, cleaner
// Python
import base64
base64.b64encode(b"hello").decode() // "aGVsbG8="
base64.b64decode("aGVsbG8=").decode() // "hello"
// Bash
echo -n "hello" | base64 // "aGVsbG8="
echo -n "aGVsbG8=" | base64 -d // "hello"
// Go
import "encoding/base64"
base64.StdEncoding.EncodeToString([]byte("hello")) // "aGVsbG8="
Try the tools
- Encoder/decoder — paste either side, the other auto-updates
- Base64url — URL-safe variant
- File converter — file → data URI
- Cheat sheet of formats — when each variant is right
Related across the network
- jwt.tooljo.com — JWT segments are base64url-encoded JSON; the decoder shows the raw bytes you’d be encoding here.
- hash.tooljo.com — hash output formats (hex / base64 / base64url) — compare side-by-side.
- base64.tooljo.com/blog/base64-is-not-encryption — base64 is encoding, not encryption. Five places engineers still confuse them.