Differences between Encoding, Encryption & Hashing

Differences between Encoding, Encryption & Hashing

Scott To

Encoding, encryption and hashing, these terms, we may see them usually in our every daily working day. But there are still misunderstandings, they are commonly interchanged and used incorrectly. Knowing the differences, when and why to use each is important, in this post I will clarify each of concepts. It may be useful for who still having ambiguous understanding about them.

1.Encoding

Encoding data is process of transforming the data from one format to another format to be able to use in various systems. If the format of data in first system needs to be transferred to second system but the second system can't consume that format, the data need to be encoded (to consumable format for second system).

Encoding is reversible process, which means we can decode the encoded data to get the original value if we know which algorithm is used for encoding, thus it is not for security purpose and there is no "key" needed for encoding process.

Some popular encoding methods are: Base64 encode, Url encode, ASCII,...

2.Hashing

Hashing is process of converting input data of various size into a fixed-length output using a hash function. In contrast to encoding, hashing is un-reversible. That means we can't guess the input of a hashing if we know the hashed output and even know the hash function.

The most important of a hash function is uniqueness. There cannot be the same hash value for different text. If two hashes calculate for two different inputs give the same output, it's call hash collision and this hash function is useless.

How hash function works

The input message of a hash is divided into multiple blocks, and the message is processed one block at a time. The first block is picked first to the hash function, then output of the first block is combined with second block to become the input for second hash. These steps occur similarly and consecutively until the end of message. The final output is the combined value of all the blocks. By this way, if just a single bit is changed in message, the entire hash value will change. This process is called the Avalanche Effect.

Uses of hashing

  • Hashing ensures the integrity of the information being sent over the network. By comparing the hash value of received data and the original hash value, we can detect any change in data.
  • Hashing can also be used to verify user's password.

Some popular hash functions are:

  • Message Digest (MD5): too old and weak
  • Secure Hashing Algorithm (SHA): including SHA-1, SHA-256, SHA-512,...

Password hashing

Every system should not store user's password in plain text. In stead, the password should be transformed and stored as un-guessable string. Hashing can deal with it.

But hashing plain text password is not enough for a high secure system. Imagine a hacker have access to our database, he can use dictionary attack, get each possible password and its hash value in his dictionary and use brute force method to scan the whole user table to find every user has the same password. With the help of modern hardware/CPU this process will not take too long.

Using a salt value along with each password can mitigate the attack. Salt is a randomly generated value. Now the hash password is the result of the hash function which the input is combination of the salt and the plain text password. This creates a big bottleneck for the attacker. Now, to run brute force scan for each password in his dictionary, he needs to loop the full table scan for each of the salt in database which will take much longer time.

Hashes, when used for security, need to be slow. A hash function used for password hashing needs to be slow to compute because a rapidly computed algorithm could make brute-force attacks more feasible, especially with the rapidly evolving power of modern hardware. Slow to hash means slow to crack. And some famous algorithms for password hashing which is slow and safe enough are: BCrypt, PBKDF2, and SCrypt

3. Encryption

Encryption is the process of securely transforming data in such a way that only authenticated user with a key can decrypt to get the original data. Encryption is a reversible process, which means if we have the decrypt key, we can read the encrypted data. Encryption is used for security purpose to protect data from man-in-the-middle.

There are two kinds of encryption algorithms:

  • Symmetric encryption: using only one shared-key called public key for both encrypting phase and decrypting phase. Some popular algorithms are: AES (Advanced Encryption Standard), DES (Data Encryption Standard), IDEA (International Data Encryption Algorithm), Blowfish,...
  • Asymmetric encryption: use 2 keys, one public key to encrypt data, one private key to decrypt data. Some popular algorithms are: RSA (Rivest Shamir Adleman), DSS (Digital Signature Standard),...

Asymmetric encryption/decryption is slower than symmetric one, but more secure, because the private key is not shared publicly. Asymmetric encryption uses longer keys than symmetric encryption in order to provide better security than symmetric key encryption, so it contributes to slower encryption speed. Beside of that, complexity of the encryption algorithms is also a factor of slow process.

Depending on the purpose, we should choose the suitable encryption. If we are focusing more on the security, asymmetric comes first. If we care about encryption performance, we use symmetric.

In many cases, we can use the combination of both symmetric & asymmetric algorithm. One big example is SSL/TLS connection. The TLS connection use asymmetric encryption in the handshake step to exchange a session key between client and server. Then the session key (public key) is used by symmetric encryption in further communications to exchange real data.