A checksum is a small piece of data that allows you check if another piece of data is the same as expected.
For example, in Bitcoin,
addresses include checksums so they can be checked to see if they have been typed in correctly.
How do they work?
You would then keep the data and the checksum together, so that you can check that the whole thing has been typed in correctly the next time you use it.
If you make one small mistake (in any part), the data will no longer match the checksum.
So basically, a checksum is a handy little error-checking tool.
Where are checksums used in Bitcoin?
Checksums are included in:
- WIF Private Keys
These two keys are commonly transcribed (copied, pasted, typed, written down, etc.), so it's useful for them to contain checksums.
The presence of a checksum enables software to validate these types of keys when they are typed in. The software won't be able to tell you what the key should be, but at least it will be able to save you from sending money to the wrong address due to a typo.
Creating a checksum.
As mentioned, checksums in Bitcoin are created by hashing data through SHA256 twice and taking the first 4 bytes.
You could call a checksum in bitcoin a "truncated SHA256 hash".
data = learnmeabitcoin sha256(sha256(data)) = 52bbde771cbf39f8a7db44372ba3ed2336276f95e3a8723388a28943cc95df57 checksum = 52bbde77
1 byte = 2 characters
This is how you might calculate a checksum in Ruby:
require 'digest' def checksum(data) # 1. Convert data to binary before hashing it. binary = [data].pack("H*") # 2. Hash the data twice hash1 = Digest::SHA256.digest(binary) hash2 = Digest::SHA256.digest(hash1) # 3. Take the first 4 bytes checksum = hash2[0,4] # 4. Convert binary back to hexadecimal and return result hex = checksum.unpack("H*") return hex end puts checksum("learnmeabitcoin")
Why only the first 4 bytes?
It would be safer and more reliable to use the full hash result as a checksum. However, this would make addresses much longer, as the entire 32 byte hash would have to be included inside.
The taking of the first 4 bytes gives you enough "uniqueness" to be pretty sure the original data is correct, whilst also not making the final address inconveniently long. It's just a balance between reliability and convenience really.
What are the chances you make a mistake, but still get the same checksum result?
The checksum is a random 4-byte hexadecimal number, so there is a 1 in
0xFFFFFFFF of that happening. In decimal, that's 1 in 4,294,967,295.
So pretty slim.
- Gregory Maxwell, for the quick computer science lesson on (and the history of) checksums.