A checksum is a small piece of data that allows you check if another piece of data is the same as expected.
For example, in Bitcoin,
addresses include checksums so they can be checked to see if they have been typed in correctly.
How do they work?
You would then keep the data and the checksum together, so that you can check that the whole thing has been typed in correctly the next time you use it.
If you make one small mistake (in any part), the data will no longer match the checksum.
So basically, a checksum is a handy little error-checking tool.
Where are checksums used in Bitcoin?
Checksums are included in:
These two keys are commonly transcribed (copied, pasted, typed, written down, etc.), so it’s useful for them to contain checksums.
The presence of a checksum enables software to validate these types of keys when they are typed in. The software won’t be able to tell you what the key should be, but at least it will be able to save you from sending money to the wrong address due to a typo.
Creating a checksum.
As mentioned, checksums in Bitcoin are created by hashing data through SHA256 twice and taking the first 4 bytes.
You could call a checksum in bitcoin a “truncated SHA256 hash”.
data = learnmeabitcoin sha256(sha256(data)) = 52bbde771cbf39f8a7db44372ba3ed2336276f95e3a8723388a28943cc95df57 checksum = 52bbde77
1 byte = 2 characters
This is how you might calculate a checksum in Ruby:
require 'digest' def checksum(data) # 1. Convert data to binary before hashing it. binary = [data].pack("H*") # 2. Hash the data twice hash1 = Digest::SHA256.digest(binary) hash2 = Digest::SHA256.digest(hash1) # 3. Take the first 4 bytes checksum = hash2[0,4] # 4. Convert binary back to hexadecimal and return result hex = checksum.unpack("H*") # unpack returns an array, so  just grabs the first result return hex end puts checksum("learnmeabitcoin")
Checking a checksum.
You can verify a checksum by calculating the expected checksum for a piece of data, and comparing it with the one given.
Example: checking if an address is valid.
A common situation is checking that a given address is valid (all addresses come with a checksum inside).
To do this, you first of all need to decode the address from base58. Then you separate the data part from the checksum, and verify that the checksum you calculate from the data matches the one given.
address = "1AKDDsfTh8uY4X3ppy1m7jw1fVMBSMkzjP" # typical P2PKH address base58_decoded = "00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6" # (base58 decoding not shown here) data = "00662ad25db00e7bb38bc04831ae48b4b446d12698" # 1-byte prefix + 20-byte public key hash checksum = "17d515b6" # 4-byte checksum data_checksum = checksum("00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6") # calculate the checksum = "17d515b6" # check it matches the one given
A base58 decoded address contains: a prefix, the hash160 of something (e.g. public key hash), and a checksum. But all you really need to know here is that the checksum is the last 4 bytes.
This Ruby code uses the same
checksum() function above.
require 'digest' # Checksum function def checksum(data) binary = [data].pack("H*") hash1 = Digest::SHA256.digest(binary) hash2 = Digest::SHA256.digest(hash1) checksum = hash2[0,4] hex = checksum.unpack("H*") return hex end # Get an address and decode it from base58 address = "1AKDDsfTh8uY4X3ppy1m7jw1fVMBSMkzjP" # example address base58_decoded = "00662ad25db00e7bb38bc04831ae48b4b446d1269817d515b6" # (base58 decoding not shown here) # Separate the data part from the checksum data = base58_decoded[0...-8] # everything apart from the last 8 characters checksum = base58_decoded[-8..-1] # the last 8 characters (4 bytes) # Calculate the checksum for the data data_checksum = checksum(data) # Check to see if it matches the checksum given verify = data_checksum == checksum # Results puts data_checksum puts checksum puts verify #=> true
As long as you can get to the data and the checksum, the verification part is pretty straightforward.
Why only the first 4 bytes?
It would be safer and more reliable to use the full hash result as a checksum. However, this would make addresses much longer, as the entire 32 byte hash would have to be included inside.
The taking of the first 4 bytes gives you enough “uniqueness” to be pretty sure the original data is correct, whilst also not making the final address inconveniently long. It’s just a balance between reliability and convenience really.
What are the chances you make a mistake, but still get the same checksum result?
The checksum is a random 4-byte hexadecimal number, so there is a 1 in
0xFFFFFFFF of that happening. In decimal, that’s 1 in 4,294,967,295.
So pretty slim.
- Gregory Maxwell, for the quick computer science lesson on (and the history of) checksums.