Checksum
A simple method for error-checking data.
A checksum is a small piece of data that allows you check if another piece of data is the same as expected.
For example, in Bitcoin, addresses
include checksums so they can be checked to see if they have been typed in correctly.
Try it!
How do they work?
In Bitcoin, checksums are created by hashing data through SHA256 twice, and then taking the first 4 bytes of the result:
You would then keep the data and the checksum together, so that you can check that the whole thing has been typed in correctly the next time you use it.
If you make one small mistake (in any part), the data will no longer match the checksum.
So basically, a checksum is a handy little error-checking tool.
Where are checksums used in Bitcoin?
Checksums are included in:
These two keys are commonly transcribed (copied, pasted, typed, written down, etc.), so it’s useful for them to contain checksums.
The presence of a checksum enables software to validate these types of keys when they are typed in. The software won’t be able to tell you what the key should be, but at least it will be able to save you from sending money to the wrong address due to a typo.
Creating a checksum.
As mentioned, checksums in Bitcoin are created by hashing data through SHA256 twice and taking the first 4 bytes.
You could call a checksum in bitcoin a “truncated SHA256 hash”.
Example.
data = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
sha256(sha256(data)) = 05c4de7c1069e9de703efd172e58c1919f48ae03910277a49c9afd7ded58bbeb
checksum = 05c4de7c
1 byte = 2 characters
Don’t forget to convert the data in to a byte sequence before performing sha256(sha256(data))
. In other words, you’re not hashing the string representation of the data but the bytes it represents.
Code.
This is how you might calculate a checksum in Ruby:
'digest'
require
def checksum(data)
# 1. Convert data to binary before hashing it.
"H*")
binary = [data].pack(
# 2. Hash the data twice
Digest::SHA256.digest(binary)
hash1 = Digest::SHA256.digest(hash1)
hash2 =
# 3. Take the first 4 bytes
0,4]
checksum = hash2[
# 4. Convert binary back to hexadecimal and return result
"H*")[0] # unpack returns an array, so [0] just grabs the first result
hex = checksum.unpack(return hex
end
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") #=> 05c4de7c puts checksum(
Checking a checksum.
You can verify a checksum by calculating the expected checksum for a piece of data, and comparing it with the one given.
Example: checking if an address is valid.
A common situation is checking that a given address is valid (all addresses come with a checksum inside).
To do this, you first of all need to decode the address from base58. Then you separate the data part from the checksum, and verify that the checksum you calculate from the data matches the one given.
address = "1AKDDsfTh8uY4X3ppy1m7jw1fVMBSMkzjP" # typical P2PKH address
base58_decoded = "00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6" # (base58 decoding not shown here)
data = "00662ad25db00e7bb38bc04831ae48b4b446d12698" # 1-byte prefix + 20-byte public key hash
checksum = "17d515b6" # 4-byte checksum
data_checksum = checksum("00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6") # calculate the checksum
= "17d515b6" # check it matches the one given
A base58 decoded address contains: a prefix, the hash160 of something (e.g. public key hash), and a checksum. But all you really need to know here is that the checksum is the last 4 bytes.
Code
This Ruby code uses the same checksum()
function above.
'digest'
require
# Checksum function
def checksum(data)
"H*")
binary = [data].pack(Digest::SHA256.digest(binary)
hash1 = Digest::SHA256.digest(hash1)
hash2 = 0,4]
checksum = hash2["H*")[0]
hex = checksum.unpack(return hex
end
# Get an address and decode it from base58
"1AKDDsfTh8uY4X3ppy1m7jw1fVMBSMkzjP" # example address
address = "00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6" # (base58 decoding not shown here)
base58_decoded =
# Separate the data part from the checksum
0...-8] # everything apart from the last 8 characters
data = base58_decoded[8..-1] # the last 8 characters (4 bytes)
checksum = base58_decoded[-
# Calculate the checksum for the data
data_checksum = checksum(data)
# Check to see if it matches the checksum given
verify = data_checksum == checksum
# Results
puts data_checksum
puts checksum#=> true puts verify
As long as you can get to the data and the checksum, the verification part is pretty straightforward.
FAQ
Why only the first 4 bytes?
It would be safer and more reliable to use the full hash result as a checksum. However, this would make addresses much longer, as the entire 32 byte hash would have to be included inside.
The taking of the first 4 bytes gives you enough “uniqueness” to be pretty sure the original data is correct, whilst also not making the final address inconveniently long. It’s just a balance between reliability and convenience really.
What are the chances you make a mistake, but still get the same checksum result?
The checksum is a random 4-byte hexadecimal number, so there is a 1 in 0xFFFFFFFF
of that happening. In decimal, that’s 1 in 4,294,967,295.
So pretty slim.
Thanks
- Gregory Maxwell, for the quick computer science lesson on (and the history of) checksums.