Base64 Encoding - How it works?

Base64 Encoding - How it works?

What is Base64 encoding?

Base64 encoding is a binary-to-text encoding/decoding scheme.

Base64 encoding is used when any binary data, such as images or video needs to be transmitted over systems that are designed to transmit data in a plain-text (ASCII) format. One example is Email system which uses SMTP protocol, was traditionally designed to work with plain text data in ASCII character set.

Why using Base64 instead of encoding using hexadecimal string? Encoding each byte in original data needs 2 hexadecimal characters, which means the data after encoding is double in size. Base64 encodes every 3 bytes of the input using 4 bytes in the output. As a consequence, the final size is 4/3 times comparing to original size, clearly better than hexadecimal encoding.

How does it work?

The process is:

  • The Base64 encoding algorithm receives an input stream of 8-bit bytes.
  • It processes the input from left to right and divides the input into 24-bit groups by concatenating three 8-bit bytes.
  • These 24-bit groups are then treated as 4 concatenated 6-bit groups.
  • Finally, each 6-bit group is converted to a single character in the Base64 table.
  • Base64 table has 64 characters: a-z (26 characters), A-Z (26 characters), 0-9 (10 characters), 2 more characters are: + and /
  • The = character is used for padding in some cases if one 24-bit group doesn't have enough actual bit.

What is padding?

In the process of base64 encoding, there will be some cases the last group (of 24-bits) doesn't have enough bit, there are 2 cases:

  • If the group has only 8 bits of input data, pad 16 bits of zero. After encoding it as a normal block, override the last 2 characters with 2 equal signs (==), so the decoding process knows 2 bytes (16 bits) of zero were padded.
  • If the group has only 16 bits of input data, pad 16 bits of zero. After encoding it as a normal block, override the last 1 character with 1 equal signs (=), so the decoding process knows 1 byte (8 bits) of zero was padded.

Example

Example 1: input: 'A' -> output: 'QQ=='

Input Data          A
Input Bits   01000001
Padding      01000001 00000000 00000000
Bit Groups   010000 010000 000000 000000
Ouput data        Q      Q      =      =

Example 2: input: 'AB' -> output: 'QUI='

Input Data          A        B
Input Bits   01000001 01000010
Padding      01000001 01000010 00000000
Bit Groups   010000 010100 001000 000000
Output data       Q      U      I      =

Example 3: input: 'ABC' -> output: 'QUJD'

Input Data          A        B        C
Input Bits   01000001 01000010 01000011
Padding      No padding
Bit Groups   010000 010100 001001 000011
Output data       Q      U      J      D