A hash function is a mini computer program that takes data, scrambles it, and gives you a unique fixed-length result.
The cool thing about hash functions is that:
- You can put as much data as you want in to the hash function, but it will always return the same-length result.
- The result is unique, so you can use it as a way to identify that data.
So in other words, a hash function allows you to create a digital fingerprint for whatever data you put in to it.
Hash function properties
A good hash function has 3 important properties that make it useful.
Note: The SHA256 hash function is the main one used in Bitcoin, so I’ll use that in my upcoming examples.
1. You cannot work out the original data from the result.
A cryptographic hash function produces a random result (with no patterns), so there is no way of “going backwards” through the hash function to figure out what the original data was.
This is the property of a cryptographic hash function. You may be able to reconstruct the original data from the result of a “basic” hash function, but a cryptographic hash function’s job is to make this as difficult as possible.
2. The same data always returns the same result.
A hash function scrambles data systematically, so that the same input will always produce the same result.
data sha256(data) --------------- ---------------------------------------------------------------- learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2 learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2 learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2
3. Different data produces different results.
If you put unique data in to the hash function, the hash function will give you a unique result.
data sha256(data) ---------------- ---------------------------------------------------------------- learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2 learnmeabitcoin1 f94a840f1e1a901843a75dd07ffcc5c84478dc4f987797474c9393ac53ab55e6 learnmeabitcoin2 b9638ef00b064055b5d0b408414be02f3ab66cce752c7ac3b7595b0fffaa6567 learnmeabitcoin3 c6fd80741e150fb7ee71453fb0a2a391261f6a0d4d60759b843639e6cbae7b91 learnmeabitcoin4 255da46dc8699fffd841b7c66a31eeb4f8eda8e1ca6850c7356376518f52d3c1
If different data returned the same result it would be called a “collision”, and it would mean the hash function was broken.
Where are hash functions used in Bitcoin?
1. Transaction Hashes
- The ability to hash a long string of transaction data in to a short, unique string allows you to create a unique identifier for each transaction.
2. Block Hashes (and Mining)
- So you can also create a unique ID for each block.
- The fact that each hash result is random allows for the mechanism of mining.
- The fact that you cannot work backwards from a hash result potentially helps with the security of public keys when they are placed inside locking scripts.
- RIPEMD-160 produces a digest that’s shorter than the length of the public key, which reduces the length of the resulting address.
How do you hash data in Bitcoin?
There are two main methods for hashing data in Bitcoin, and they have the following names:
This involves putting data through the SHA-256 hash function, then putting the result through the SHA-256 again. Or in other words, it’s just “double SHA-256”. We call it Hash256 for short.
require 'digest' def hash256(hex) # convert hexadecimal string to byte sequence first binary = [hex].pack("H*") # H = hex string (highest byte first), * = multiple bytes # SHA-256 (first round) hash1 = Digest::SHA256.digest(binary) # SHA-256 (second round) hash2 = Digest::SHA256.digest(hash1) # convert from byte sequence back to hexadecimal hash256 = hash2.unpack("H*") return hash256 end puts hash256("aa") #=> e51600d48d2f72eb428e78733e01fbd6081b349528335bf21269362edfae185d
This involves putting data through the SHA-256 hash function, then putting the result through the RIPEMD-160 hash function next. We call it Hash160 for short.
RIPEMD-160 produces a shorter hash digest (160 bits / 20 bytes) compared to SHA-256 (256 bits / 32 bytes), so it’s typically used when you want to produce a shorter hash than what you’d get from using Hash256.
It’s only used when shortening
public keys and scripts in the process of creating legacy
addresses (e.g. addresses beginning with a
1 or a
3). It has not been used in any recent developments that require hashing of data in Bitcoin.
require 'digest' def hash160(data) # convert hexadecimal string to byte sequence first binary = [data].pack("H*") # H = hex string (highest byte first), * = multiple bytes # SHA-256 sha256 = Digest::SHA256.digest(binary) # RIPEMD-160 ripemd160 = Digest::RMD160.digest(sha256) # convert from byte sequence back to hexadecimal hash160 = ripemd160.unpack("H*").join return hash160 end puts hash160("aa") #=> 58d179ca6112752d00dc9b89ea4f55a585195e26
A common mistake when hashing
A common mistake when hashing data in bitcoin is to insert strings in to the hash function, and not the underlying byte sequences those strings actually represent.
For example, let’s say we have the hexadecimal string
If we insert this string directly in to the hash function, your programming language will actually send the ASCII encoding of each of these characters in to the hash function, which looks like this in binary:
"ab" = 01100001 01100010 sha256(0110000101100010) = fb8e20fc2e4c3f248c60c39bd652f3c1347298bb977b8b4d5903b85055620603
But what we actually want to send in to the hash function is the byte this hexadecimal string represents, which looks like this in binary:
0xab = 10101011 sha256(10101011) = 087d80f7f182dd44f184aa86ca34488853ebcc04f0c60d5294919a466b463831
This is why we usually need to “pack” our hexadecimal strings in to bytes first before hashing.
Most programming languages will have functions that allow you to do this:
You will probably see a bunch of jargon text if you print out these converted binary values. This makes sense, because your programming language converts this binary data back to ASCII when printing, and it probably now refers to a weird code point in the ASCII table.
Remember that hash functions take in binary data as the input, so we need to be specific about the binary data we want to insert.
All bitcoin data is just a bunch of bytes at the end of the day. We just use their hexadecimal string representation for convenience from time to time.
If you forget to convert your hexadecimal strings to their corresponding bytes beforehand, your programming language will assume you want to send the binary representation of the characters in the string, and this will produce a completely different hash result than expected.
This is by far the most common issue people have when they hashing data in Bitcoin for the first time. So if you’re not getting the right hash results, this is probably where you’re going wrong.