Hash Function
A small program that scrambles data.
A hash function is a mini computer program that takes data, scrambles it, and gives you a unique fixed-length result.
The cool thing about hash functions is that:
- You can put as much data as you want in to the hash function, but it will always return the same-length result.
- The result is unique, so you can use it as a way to identify that data.
So in other words, a hash function allows you to create a digital fingerprint for whatever data you put in to it.
Hash function properties
A good hash function has 3 important properties that make it useful.
Note: The SHA256 hash function is the main one used in Bitcoin, so I’ll use that in my upcoming examples.
- You cannot work out the original data from the result.
A cryptographic hash function produces a random result (with no patterns), so there is no way of “going backwards” through the hash function to figure out what the original data was.
This is the property of a cryptographic hash function. You may be able to reconstruct the original data from the result of a “basic” hash function, but a cryptographic hash function’s job is to make this as difficult as possible.
- The same data always returns the same result.
A hash function scrambles data systematically, so that the same input will always produce the same result.
For example:
data sha256(data)
--------------- ----------------------------------------------------------------
learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2
learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2
learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2
- Different data produces different results.
If you put unique data in to the hash function, the hash function will give you a unique result.
For example:
data sha256(data)
---------------- ----------------------------------------------------------------
learnmeabitcoin ef235aacf90d9f4aadd8c92e4b2562e1d9eb97f0df9ba3b508258739cb013db2
learnmeabitcoin1 f94a840f1e1a901843a75dd07ffcc5c84478dc4f987797474c9393ac53ab55e6
learnmeabitcoin2 b9638ef00b064055b5d0b408414be02f3ab66cce752c7ac3b7595b0fffaa6567
learnmeabitcoin3 c6fd80741e150fb7ee71453fb0a2a391261f6a0d4d60759b843639e6cbae7b91
learnmeabitcoin4 255da46dc8699fffd841b7c66a31eeb4f8eda8e1ca6850c7356376518f52d3c1
If different data returned the same result it would be called a “collision”, and it would mean the hash function was broken.
Where are hash functions used in Bitcoin?
1. Transaction Hashes
You hash transaction data to get a TXID
(Transaction ID, Transaction Hash).
- The ability to hash a long string of transaction data in to a short, unique string allows you to create a unique identifier for each transaction.
2. Block Hashes (and Mining)
You hash block headers to get a block hash
.
- So you can also create a unique ID for each block.
- The fact that each hash result is random allows for the mechanism of mining.
3. Addresses
A public key
is hashed (using both SHA256 and RIPEMD160) in the process of creating a bitcoin address
.
- The fact that you cannot work backwards from a hash result potentially helps with the security of public keys when they are placed inside locking scripts.
- RIPEMD-160 produces a digest that’s shorter than the length of the public key, which reduces the length of the resulting address.
How do you hash data in Bitcoin?
There are two main methods for hashing data in Bitcoin, and they have the following names:
1. Hash256
This involves putting data through the SHA-256 hash function, then putting the result through the SHA-256 again. Or in other words, it’s just “double SHA-256”. We call it Hash256 for short.
This is the most common method for hashing data in Bitcoin. It’s used when hashing transaction data to create TXID
s, and when hashing block headers during mining.
'digest'
require
def hash256(hex)
# convert hexadecimal string to byte sequence first
"H*") # H = hex string (highest byte first), * = multiple bytes
binary = [hex].pack(
# SHA-256 (first round)
Digest::SHA256.digest(binary)
hash1 =
# SHA-256 (second round)
Digest::SHA256.digest(hash1)
hash2 =
# convert from byte sequence back to hexadecimal
"H*")[0]
hash256 = hash2.unpack(
return hash256
end
"aa") #=> e51600d48d2f72eb428e78733e01fbd6081b349528335bf21269362edfae185d puts hash256(
2. Hash160
This involves putting data through the SHA-256 hash function, then putting the result through the RIPEMD-160 hash function next. We call it Hash160 for short.
RIPEMD-160 produces a shorter hash digest (160 bits / 20 bytes) compared to SHA-256 (256 bits / 32 bytes), so it’s typically used when you want to produce a shorter hash than what you’d get from using Hash256.
It’s only used when shortening public keys
and scripts in the process of creating legacy addresses
(e.g. addresses beginning with a 1
or a 3
). It has not been used in any recent developments that require hashing of data in Bitcoin.
'digest'
require
def hash160(data)
# convert hexadecimal string to byte sequence first
"H*") # H = hex string (highest byte first), * = multiple bytes
binary = [data].pack(
# SHA-256
Digest::SHA256.digest(binary)
sha256 =
# RIPEMD-160
Digest::RMD160.digest(sha256)
ripemd160 =
# convert from byte sequence back to hexadecimal
"H*").join
hash160 = ripemd160.unpack(
return hash160
end
"aa") #=> 58d179ca6112752d00dc9b89ea4f55a585195e26 puts hash160(
A common mistake when hashing
A common mistake when hashing data in bitcoin is to insert strings in to the hash function, and not the underlying byte sequences those strings actually represent.
For example, let’s say we have the hexadecimal string ab
.
If we insert this string directly in to the hash function, your programming language will actually send the ASCII encoding of each of these characters in to the hash function, which looks like this in binary:
"ab" = 01100001 01100010
sha256(0110000101100010) = fb8e20fc2e4c3f248c60c39bd652f3c1347298bb977b8b4d5903b85055620603
But what we actually want to send in to the hash function is the byte this hexadecimal string represents, which looks like this in binary:
0xab = 10101011
sha256(10101011) = 087d80f7f182dd44f184aa86ca34488853ebcc04f0c60d5294919a466b463831
This is why we usually need to “pack” our hexadecimal strings in to bytes first before hashing.
Most programming languages will have functions that allow you to do this:
"ab"
hex = "H*") # H = hex string (highest byte first), * = multiple bytes binary = [hex].pack(
$hex = "ab"
$binary = pack("H*", $hex);
You will probably see a bunch of jargon text if you print out these converted binary values. This makes sense, because your programming language converts this binary data back to ASCII when printing, and it probably now refers to a weird code point in the ASCII table.
Remember that hash functions take in binary data as the input, so we need to be specific about the binary data we want to insert.
All bitcoin data is just a bunch of bytes at the end of the day. We just use their hexadecimal string representation for convenience from time to time.
If you forget to convert your hexadecimal strings to their corresponding bytes beforehand, your programming language will assume you want to send the binary representation of the characters in the string, and this will produce a completely different hash result than expected.
This is by far the most common issue people have when they hashing data in Bitcoin for the first time. So if you’re not getting the right hash results, this is probably where you’re going wrong.