Hash Function

Fingerprints for data

A hash function is a programming tool that creates fingerprints for data.

It takes in any amount of data, scrambles it, and returns a short and unique result for that data.

tool-6946f8e153e82

SHA-256 (Text)

Hash a string of text using the SHA-256 hash function.

Text

Enter any string of characters

0 characters

SHA-256

SHA-256(text) 0 bytes

0 secs

This is just a quick example of the SHA-256 hash function. It hashes text (ASCII characters) instead of hexadecimal bytes. Use SHA-256 and HASH256 instead for hashing actual raw data in Bitcoin using SHA-256.

SHA = Secure Hashing Algorithm, 256 = 256 bits (the size of the hash result).

The result of the hash function is just a bunch of bytes. So the letters and numbers you're seeing are just bytes of data represented by hexadecimal characters, which is typically how the outputs of hash functions are displayed.

Technical terms

Here are some technical terms that crop up from time to time:

The data you insert into the hash function is referred to as the "message" or "preimage".
The result of the hash function is sometimes referred to as the "digest" or simply the "hash".

"Preimage" is probably the most awkward technical term you'll come across, but it just refers to some specific data you insert into the hash function.

I prefer to use the terms "data" and "hash", but don't be surprised if you run into the other terms now and then.

Properties

What makes a hash function a hash function?

There are a few different properties that separate a basic hash function from a strong hash function

Basic Hash Function

There are a few basic properties of hash functions that you've probably already noticed:

You always get the same result for the same data. If you and someone else insert the same data into the same hash function, you'll both get the same result. This is known as being deterministic.

Deterministic
You get a fixed-length result no matter the size of the data. A hash function can take in any amount of data, and it will scramble and compress it to produce a (usually) shorter result. The size of the hash varies between hash functions, but for SHA-256 it's 256 bits (32 bytes) in size.

Fixed-length result
You get wildly different results with small changes to the data. What comes out of the hash function appears to be random. Even the smallest change to the data you insert to the hash function will produce completely different results. This is known as the avalanche effect.

Avalanche effect

Those are the obvious features you'd expect from a good "fingerprinting machine".

Strong Hash Function

(aka Cryptographic Hash Function)

There are some properties of strong hash functions (e.g. SHA-256) that you may not have noticed:

A strong hash function is irreversible. If I gave you a hash result, you wouldn't be able to work out the original data that I used to create it. The only way you could hope to find it would be to try different inputs in a brute-force search, which would be computationally infeasible given the massive range of possible outputs. Anyway, this property of not being able to work backwards is known as one-wayness.

One-wayness
Every piece of data should have its own unique result. Obviously this is technically impossible as there are infinite combinations of data out there. However, a secure hash function should make it computationally infeasible to find two different pieces of data that produce the same hash digest. This property is known as collision resistance.

Collision resistance
You can't control the result of a strong hash function. There's no way you can figure out how to construct an input to give you a specific result from a hash function. If you want a specific result, you just have to keep hashing different pieces of data to get the kind of result you want. This property is known as preimage resistance.

Preimage resistance

A hash function is referred to as a "cryptographic hash function" if it achieves these 3 strong properties.

This means that it's usually slower than a basic hash function (although still pretty fast overall), but it also means it can be relied upon to be unpredictable and produce unique results for different pieces of data. Which is an important feature when it comes to cryptography.

So as you can guess, Bitcoin uses cryptographic hash functions:

Technical terms

This is just for my reference if nothing else, as I keep forgetting what each of these terms mean when reading about secure hash functions in textbooks.

Anyway, in technical terms, a "cryptographic hash function" should possess the 3 following key properties:

Preimage Resistance: It's difficult to construct an input that produces a specific output.

Second Preimage Resistance (Weak Collision Resistance): Given some data and its hash result, it's hard to find another piece of data that will hash to the same result. This is occasionally referred to as weak collision resistance.

Collision Resistance: It's hard to find any two pieces of data that hash to the same result. This is similar to the last property, but whereas in second preimage resistance you are stuck with some starting data and have to find another piece of data that produces the same result, with collision resistance you are free to choose any two pieces of data that produce the same hash result. This is therefore sometimes referred to as strong collision resistance.

These three terms seem to get mixed up from time to time (especially because second preimage resistance is a form of collision resistance).
To put it another way, you have preimage resistance, and two types of collision resistance.

And what is a preimage exactly?

Preimage (mathematics) - The set of arguments of a function corresponding to a particular subset of the range.

thefreedictionary.com

So a "preimage" is something that you put into a function that maps to a specific result.

Bitcoin

What hash functions does Bitcoin use?

There are five methods for hashing data in Bitcoin:

HASH256 – Double SHA-256 (most common)
HASH160 – SHA-256 + RIPEMD-160
SHA-256 – Single SHA-256
HMAC-SHA512 – HMAC with SHA-512
PBKDF2 – Password Based Key Derivation Function 2

In Bitcoin we hash bytes of data. So in the tools below, you need to represent bytes by using hexadecimal characters (where every byte is made from two hex characters). See a common mistake with hashing for more details.

HASH256

SHA-256(SHA-256(data))

If you're hashing something in Bitcoin, you're almost always using HASH256.

This works by putting the data through SHA-256, then taking the result and put it through SHA-256 again.

tool-6946f8e1543d7

HASH256

Double SHA-256. Used for hashing block headers, transaction data, and mostly anything that needs to be hashed in Bitcoin.

Data (Hex) 0 bytes

SHA-256

HASH256

SHA-256(SHA-256(data))

0 bytes

0 secs

You may notice that the hashes you get for block data and transaction data appear to be backwards. This is because block hashes and transaction hashes are actually in reverse byte order when searching for them using bitcoin-cli and on blockchain explorers.

This is the primary method for hashing data in Bitcoin. It's sometimes referred to as "double-SHA256" or "SHA-256d", but in code it's most commonly called hash256 for short.

Here are some examples of where you'll find HASH256 being used in Bitcoin:

Code


copied
copied
require 'digest'

def hash256(hex)
  # convert hexadecimal string to byte sequence first
  binary = [hex].pack("H*") # H = hex string (highest byte first), * = multiple bytes

  # SHA-256 (first round)
  hash1 = Digest::SHA256.digest(binary)

  # SHA-256 (second round)
  hash2 = Digest::SHA256.digest(hash1)

  # convert from byte sequence back to hexadecimal
  hash256 = hash2.unpack("H*")[0]

  return hash256
end

puts hash256("aa") #=> e51600d48d2f72eb428e78733e01fbd6081b349528335bf21269362edfae185d

Why do we hash twice?

Satoshi never mentioned why they chose double-SHA256 when designing Bitcoin.

Satoshi was probably concerned about something called a length extension attack, and it has been recommended in some literature (e.g. Cryptography Engineering) to use double-SHA256 to protect against it.

These kinds of attacks are not a concern for Bitcoin though, so either Satoshi was misguided in their choice of using double-SHA256, or they just wanted to be extra cautious.

Satoshi standardized on using double-SHA256 for 32-byte hashes, and SHA256+RIPEMD160 (each once) for 20-byte hashes, presumably because of (likely misguided) concern about certain attacks (like length extension attacks, which only apply when hashing secret data), and then used those everywhere.

Pieter Wuille, bitcoin.stackexchange.com

Either way, hashing data twice is now just a quirk of Bitcoin.

If you designed Bitcoin from scratch today there would be no benefit to using double-SHA256. In fact, recent upgrades to Bitcoin now favor using single-SHA256 where possible (e.g. script hashes in P2WSH).

Hash160

RIPEMD-160(SHA-256(data))

HASH160 is used infrequently in Bitcoin.

This works by putting the data through SHA-256, then taking the result and put it through another hash function called RIPEMD-160.

tool-6946f8e156f14

HASH160

SHA-256 + RIPEMD-160. Used for shortening a public key or script before converting to an address.

Data (Hex)

A public key or script for example

0 bytes

SHA-256

RIPEMD-160

HASH160

RIPEMD-160(SHA-256(data))

0 bytes

0 secs

HASH160 is only used when constructing legacy addresses:

Upgrades to Bitcoin over the years have not made further use of HASH160 when hashing data, and so now it's only used when constructing addresses for legacy locking scripts.

Code


copied
copied
require 'digest'

def hash160(data)
  # convert hexadecimal string to byte sequence first
  binary = [data].pack("H*") # H = hex string (highest byte first), * = multiple bytes

  # SHA-256
  sha256 = Digest::SHA256.digest(binary)

  # RIPEMD-160
  ripemd160 = Digest::RMD160.digest(sha256)

  # convert from byte sequence back to hexadecimal
  hash160 = ripemd160.unpack("H*").join

  return hash160
end

puts hash160("aa") #=> 58d179ca6112752d00dc9b89ea4f55a585195e26

Why RIPEMD-160?

RIPEMD-160 produces a shorter digest than SHA-256:

Hash Function	Digest Size	Example
SHA-256	32 bytes (256 bits)	`e51600d48d2f72eb428e78733e01fbd6081b349528335bf21269362edfae185d`
RIPEMD-160	20 bytes (160 bits)	`58d179ca6112752d00dc9b89ea4f55a585195e26`

So RIPEMD-160 is ideal for creating shorter (yet still secure) fingerprints for public keys and scripts.

As for why Satoshi chose RIPEMD-160 over something like SHA-1 (which also produces 160-bit digests), I'm not sure. It could have been because RIPEMD-160 was known to be more collision resistant at the time (SHA-1 has had collisions since, so a wise choice in hindsight), or because Satoshi simply preferred to use a hash function designed by a separate organization.

Unlike all SHA-1 and SHA-2 algorithms, RIPEMD-160 is the only one that was not designed by NIST and NSA, but rather by a team of European researchers. Even though there is no indication that any of the SHA algorithms are artificially weakened or contain backdoors (introduced by the US government, that is), RIPEMD-160 might appeal to some people who heavily distrust governments.

Christof Paar, Understanding Cryptography

It is worth noting that Satoshi could've used SHA2-256 twice and truncated the second digest to 160 bits as this is equally secure. The fact that he didn't is some evidence to show that his decision was a conscious decision to use RIPEMD-160 over the NSA suit of algorithms.

liamzebedee, bitcoin.stackexchange.com

Either way, RIPEMD-160 is a fine choice for use as a 160-bit hash function, even if we don't really use it much any more in Bitcoin.

The use of SHA-256 + RIPEMD-160 helps to prevent against length extension attacks too (even though this is once again unnecessary).

SHA-256

SHA-256(data)

This is where you just the data through SHA-256 once. Nothing special this time.

tool-6946f8e157124

SHA-256

Single SHA-256. Hash bytes of data using the SHA-256 hash function.

Data (Hex) 0 bytes

SHA-256

SHA-256(data) 0 bytes

0 secs

This tool only accepts bytes of data in the form of hexadecimal characters. This is different to the SHA-256 (Text) tool at the top of the page, which accepts any text data, but that's just an example tool and is not the way data is hashed in Bitcoin.

More recent changes to Bitcoin have started to use a single round of SHA-256 (instead of HASH256):

Script Hashes (P2WSH)

However, it's nowhere near as prevalent as HASH256. So as a general rule, if you're hashing something in Bitcoin, it's most likely to be HASH256 and not a single SHA-256.

Code


copied
copied
require 'digest'

def sha256(hex)
  # convert hexadecimal string to byte sequence first
  binary = [hex].pack("H*") # H = hex string (highest byte first), * = multiple bytes

  # SHA-256 (single)
  hash = Digest::SHA256.digest(binary)

  # convert from byte sequence back to hexadecimal
  sha256 = hash.unpack("H*")[0]

  return sha256
end

puts sha256("aa") #=> bceef655b5a034911f1c3718ce056531b45ef03b4c7b1f15629e867294011a7d

HMAC-SHA512

HMAC with SHA-512

We use HMAC-SHA512 when we want to hash some data with an additional secret key.

HMAC - The method for hashing some data with an additional key.
SHA-512 - The hash algorithm used within the HMAC.

So SHA-512 is the actual hash algorithm, and HMAC (Hash-based Message Authentication Code) is the method for combining the two pieces of data together using that hash algorithm.

tool-6946f8e157280

HMAC-SHA512

Used when deriving extended keys.

Data (Hex)

seed or (private/public key + 4-byte index)

0 bytes

Key (Hex)

"Bitcoin seed" or chain code

0 bytes

(ASCII)

HMAC-SHA512

Result

HMAC-SHA512(data, key)

0 bytes

0 secs

HMAC-SHA512 is used in Bitcoin when creating extended keys in HD Wallets:

Extended Keys

SHA-512 is used within the HMAC when creating extended keys because it produces a 64 byte hash result, which means you can chop up the result to get a new private key (the first 32 bytes) and a new chain code (the last 32 bytes) to form the child extended key.

In general cryptography, a HMAC is used when you want to hash some data, but also want to hash it with an additional secret key so that you can only get the same result if you have both the original data and the secret key.
Using a HMAC protects against length-extension attacks, which means it's a safer method than simply appending the secret key to the data and hashing it that way.

Code


copied
copied
require 'openssl'

# Example data
data = "67f93560761e20617de26e0cb84f7234aaf373ed2e66295c3d7397e6d7ebe882ea396d5d293808b0defd7edd2babd4c091ad942e6a9351e6d075a29d4df872af"
key = "Bitcoin seed"

# HMAC-SHA512
hmac = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA512.new, key, [data].pack("H*"))
puts hmac #=> f79bb0d317b310b261a55a8ab393b4c8a1aba6fa4d08aef379caba502d5d67f9463223aac10fb13f291a1bc76bc26003d98da661cb76df61e750c139826dea8b

PBKDF2

Password Based Key Derivation Function 2

PBKDF2 is used when you want to hash data multiple times.

tool-6946f8e1574b2

PBKDF2

Create a seed for a HD Wallet from a mnemonic sentence (and optional passphrase).

Password (String)

mnemonic sentence

Salt (String)

passphrase

(salt must start with prefix "mnemonic")

PBKDF2

iterations: 2048
algorithm: HMAC-SHA512
length: 64 bytes

Result

PBKDF2(password, salt, iterations, algorithm, length)

0 bytes

Never enter your mnemonic sentence into a website, or use a mnemonic sentence generated by a website. Websites can easily save the seed and use it to steal all your bitcoins.

0 secs

The fact that PBKDF2 uses multiple iterations for hashing means that it is intentionally slow, which makes it more difficult to perform brute-force attacks.

So PBKDF2 is not actually a hash algorithm itself, but instead uses an existing hash algorithm (e.g. HMAC-SHA512) with multiple repetitions (in a specific way) before producing the final result.

In Bitcoin, PBKDF2 is used on the mnemonic sentence to create the initial seed for use in HD Wallets.

Bitcoin uses 2,048 iterations of PBKDF2 to convert a mnemonic sentence (and optional passphrase) to a seed. This makes it more time-consuming to try and crack someone else's mnemonic sentence and/or passphrase.

Code


copied
copied
require 'openssl'

# Example data
mnemonic = "punch shock entire north file identify"
passphrase = ""

# Prepare data for PBKDF2
password = mnemonic
salt = "mnemonic#{passphrase}" # "mnemonic" is always used in the salt with optional passphrase appended to it
iterations = 2048
keylength = 64
digest = OpenSSL::Digest::SHA512.new

# PBKDF2
result = OpenSSL::PKCS5.pbkdf2_hmac(password, salt, iterations, keylength, digest)
seed = result.unpack("H*")[0] # convert to hexadecimal string
puts seed #=> e1ca8d8539fb054eda16c35dcff74c5f88202b88cb03f2824193f4e6c5e87dd2e24a0edb218901c3e71e900d95e9573d9ffbf870b242e927682e381d109ae882

Usage

Where is hashing used in Bitcoin?

Hash functions are a useful general-purpose tool in programming, and they're used liberally throughout Bitcoin.

Here are the most important examples:

Mining

Relevant Properties: Preimage Resistance, Avalanche Effect
Hashing Method: HASH256

Diagram showing a block hash needing to get below a target value for it to be added on to the blockchain.

This is the most famous use of the hash function in Bitcoin.

A block header gets hashed, and the resulting block hash is interpreted as an integer. This integer must be below a certain target value for the block to be considered "valid" or "mined".

The fact that the result of the hash function is uncontrollable (preimage resistance) and wildly different for each nonce value (avalanche effect) creates a network-wide lottery, where nobody is in control of when the next block is mined.

Blockchain

Relevant Properties: Deterministic, Collision Resistance
Hashing Method: HASH256

Diagram showing how block hashes are used to create a chain of blocks.

Each block in the blockchain references the hash of previous block. This connects all the blocks in the blockchain together, and prevents anyone from changing the contents of a block anywhere in the chain.

Any change to a block lower down in the chain will change its hash, and therefore the blocks above it will no longer be connected to it and will no longer be part of the longest chain.

TXID

Relevant Properties: Deterministic, Collision Resistance
Hashing Method: HASH256

The data for each individual transaction is hashed to create a TXID (Transaction ID). This creates a unique reference number for every transaction (deterministic).

This allows you to reference coins created in previous transactions as inputs for spending in future transactions, as well as being able to search for transactions in a blockchain explorer.

The fact that it's hard to find two pieces of data that hash to the same result (collision resistance) means that every transaction can have its own short and unique reference number.

Merkle Root

Relevant Properties: Deterministic, Collision Resistance
Hashing Method: HASH256

Every block header includes a fingerprint for all of the transaction data included in the block.

This fingerprint is called the merkle root, and it's basically all of the TXIDs hashed together in a tree-like structure.

Hashing allows you to "commit" all the transaction data to the block header (deterministic). Therefore, if anyone changes transaction data in the block, it will no longer match the fingerprint in the header (collision resistance), and the modified block will be invalid.

Checksum

Relevant Properties: Deterministic, Avalanche Effect
Hashing Method: HASH256

Some checksums are just the truncated hash of some data.

These checksums are bundled with data to allow you to check if the data has been input correctly. For example, a checksum is included at the end of a legacy bitcoin address, so if you type in one part of the address incorrectly, the data will not match the checksum (or vice versa) (avalanche effect), and the error can be detected before you make the transaction.

Checksums are also used in networking to help make sure the contents of a message have not been lost during transit (deterministic).

Public Key Hash

Relevant Properties: Fixed-Length
Hashing Method: HASH160

Diagram showing a public key being shortened by hashing it.

A public key is either 33 or 65 bytes in size. However, before it gets converted to an address, it gets put through a hash function to shorten it to a 20-byte public key hash.

This allows you to create slightly shorter addresses than if you had not hashed the public key beforehand (fixed-length result).

Extended Keys

Relevant Properties: Preimage Resistance, Collision Resistance
Hashing Method: HMAC (SHA-512)

Diagram showing child keys being derived by hashing a previous extended key.

Hierarchical Deterministic Wallets allow you to create multiple private keys from a single seed.

Each extended private key is created by hashing the previous extended private key, which gives you a completely new, unique, and independent private key to use (collision resistance).

This illustrates the security of hash functions, as each new result of the hash function is reliable enough to use as a private key, because you cannot work backwards (preimage resistance) to work out a previous private key from another.

Signing Transactions

Relevant Properties: Fixed Length, Collision Resistance
Hashing Method: HASH256

When you sign a transaction, you actually sign a hash of the transaction data.

Hashing data creates a shortened fingerprint for it (fixed-length result), and it's more efficient to sign the hash of a transaction (i.e. 32 bytes) than the full transaction data itself (e.g. 250+ bytes). The hash you sign is also unique for each piece of transaction data (collision resistance), so the resulting signature cannot be reused within a different transaction.

The reason hash functions were invented in the first place was to improve the efficiency of signing long messages.

Notes

How do hash functions work?

Excellent question, I'm glad you asked.

It's at this point I'd usually say that this is "outside the scope of this article" and then distract you with lots of technical terminology and hand waving.

So I made this video on how SHA-256 works instead.

As I say though, a hash function just scrambles and compresses the underlying bits (the 1s and 0s) of computer data. And that's all you really need to know.

A common mistake when hashing

A common mistake when hashing data in Bitcoin is to insert strings into the hash function, and not the underlying byte sequences those strings actually represent.

For example, let's say we have the hexadecimal string ab.

If we insert this directly into the hash function as a string, your programming language will actually send the ASCII encoding of each of these characters into the hash function, which looks like this in binary:

"ab" = 01100001 01100010
sha256(0110000101100010) = fb8e20fc2e4c3f248c60c39bd652f3c1347298bb977b8b4d5903b85055620603

Byte Ascii

tool-6946f8e157b70

Byte (ASCII)

Convert between a byte and an ASCII character.

Binary Hexadecimal

0 0

Decimal 0

Character

ASCII Table

The following characters are from Code Page 437, which is a popular 8-bit ASCII character set.

Basically, all ASCII character sets contain the same standard letters and numbers (between 0x20 and 0x7f). These are historically known as the printable characters.

Code Page 437 extended this with an additional 128 characters (between 0x80 and 0xff) to include international, box drawing, and mathematical characters too. This additional set of characters is commonly referred to as "extended ASCII".

Code Page 437 also replaced the obsolete control characters (between 0x01 and 0x1f) from the original ASCII standard (e.g. ISO 646) with decorative characters instead.

There is no specific ASCII character set used in Bitcoin, but this is a popular one, and it's good for demonstrating how bytes can be assigned to characters.

Standard ASCII

Decorative Characters
Binary	Hexadecimal	Decimal	Character
00000000	00	0
00000001	01	1	☺
00000010	02	2	☻
00000011	03	3	♥
00000100	04	4	♦
00000101	05	5	♣
00000110	06	6	♠
00000111	07	7	•
00001000	08	8	◘
00001001	09	9	○
00001010	0a	10	◙
00001011	0b	11	♂
00001100	0c	12	♀
00001101	0d	13	♪
00001110	0e	14	♫
00001111	0f	15	☼
00010000	10	16	►
00010001	11	17	◄
00010010	12	18	↕
00010011	13	19	‼
00010100	14	20	¶
00010101	15	21	§
00010110	16	22	▬
00010111	17	23	↨
00011000	18	24	↑
00011001	19	25	↓
00011010	1a	26	→
00011011	1b	27	←
00011100	1c	28	∟
00011101	1d	29	↔
00011110	1e	30	▲
00011111	1f	31	▼

Printable Characters
Binary	Hexadecimal	Decimal	Character
00100000	20	32	(space)
00100001	21	33	!
00100010	22	34	"
00100011	23	35	#
00100100	24	36	$
00100101	25	37	%
00100110	26	38	&
00100111	27	39	'
00101000	28	40	(
00101001	29	41	)
00101010	2a	42	*
00101011	2b	43	+
00101100	2c	44	,
00101101	2d	45	-
00101110	2e	46	.
00101111	2f	47	/
00110000	30	48	0
00110001	31	49	1
00110010	32	50	2
00110011	33	51	3
00110100	34	52	4
00110101	35	53	5
00110110	36	54	6
00110111	37	55	7
00111000	38	56	8
00111001	39	57	9
00111010	3a	58	:
00111011	3b	59	;
00111100	3c	60	<
00111101	3d	61	=
00111110	3e	62	>
00111111	3f	63	?
01000000	40	64	@
01000001	41	65	A
01000010	42	66	B
01000011	43	67	C
01000100	44	68	D
01000101	45	69	E
01000110	46	70	F
01000111	47	71	G
01001000	48	72	H
01001001	49	73	I
01001010	4a	74	J
01001011	4b	75	K
01001100	4c	76	L
01001101	4d	77	M
01001110	4e	78	N
01001111	4f	79	O
01010000	50	80	P
01010001	51	81	Q
01010010	52	82	R
01010011	53	83	S
01010100	54	84	T
01010101	55	85	U
01010110	56	86	V
01010111	57	87	W
01011000	58	88	X
01011001	59	89	Y
01011010	5a	90	Z
01011011	5b	91	[
01011100	5c	92	\
01011101	5d	93	]
01011110	5e	94	^
01011111	5f	95	_
01100000	60	96	`
01100001	61	97	a
01100010	62	98	b
01100011	63	99	c
01100100	64	100	d
01100101	65	101	e
01100110	66	102	f
01100111	67	103	g
01101000	68	104	h
01101001	69	105	i
01101010	6a	106	j
01101011	6b	107	k
01101100	6c	108	l
01101101	6d	109	m
01101110	6e	110	n
01101111	6f	111	o
01110000	70	112	p
01110001	71	113	q
01110010	72	114	r
01110011	73	115	s
01110100	74	116	t
01110101	75	117	u
01110110	76	118	v
01110111	77	119	w
01111000	78	120	x
01111001	79	121	y
01111010	7a	122	z
01111011	7b	123	{
01111100	7c	124	\|
01111101	7d	125	}
01111110	7e	126	~
01111111	7f	127	⌂

Extended ASCII

International Characters
Binary	Hexadecimal	Decimal	Character
10000000	80	128	Ç
10000001	81	129	ü
10000010	82	130	é
10000011	83	131	â
10000100	84	132	ä
10000101	85	133	à
10000110	86	134	å
10000111	87	135	ç
10001000	88	136	ê
10001001	89	137	ë
10001010	8a	138	è
10001011	8b	139	ï
10001100	8c	140	î
10001101	8d	141	ì
10001110	8e	142	Ä
10001111	8f	143	Å
10010000	90	144	É
10010001	91	145	æ
10010010	92	146	Æ
10010011	93	147	ô
10010100	94	148	ö
10010101	95	149	ò
10010110	96	150	û
10010111	97	151	ù
10011000	98	152	ÿ
10011001	99	153	Ö
10011010	9a	154	Ü
10011011	9b	155	¢
10011100	9c	156	£
10011101	9d	157	¥
10011110	9e	158	₧
10011111	9f	159	ƒ
10100000	a0	160	á
10100001	a1	161	í
10100010	a2	162	ó
10100011	a3	163	ú
10100100	a4	164	ñ
10100101	a5	165	Ñ
10100110	a6	166	ª
10100111	a7	167	º
10101000	a8	168	¿
10101001	a9	169	⌐
10101010	aa	170	¬
10101011	ab	171	½
10101100	ac	172	¼
10101101	ad	173	¡
10101110	ae	174	«
10101111	af	175	»

Box Drawing Characters
Binary	Hexadecimal	Decimal	Character
10110000	b0	176	░
10110001	b1	177	▒
10110010	b2	178	▓
10110011	b3	179	│
10110100	b4	180	┤
10110101	b5	181	╡
10110110	b6	182	╢
10110111	b7	183	╖
10111000	b8	184	╕
10111001	b9	185	╣
10111010	ba	186	║
10111011	bb	187	╗
10111100	bc	188	╝
10111101	bd	189	╜
10111110	be	190	╛
10111111	bf	191	┐
11000000	c0	192	└
11000001	c1	193	┴
11000010	c2	194	┬
11000011	c3	195	├
11000100	c4	196	─
11000101	c5	197	┼
11000110	c6	198	╞
11000111	c7	199	╟
11001000	c8	200	╚
11001001	c9	201	╔
11001010	ca	202	╩
11001011	cb	203	╦
11001100	cc	204	╠
11001101	cd	205	═
11001110	ce	206	╬
11001111	cf	207	╧
11010000	d0	208	╨
11010001	d1	209	╤
11010010	d2	210	╥
11010011	d3	211	╙
11010100	d4	212	╘
11010101	d5	213	╒
11010110	d6	214	╓
11010111	d7	215	╫
11011000	d8	216	╪
11011001	d9	217	┘
11011010	da	218	┌
11011011	db	219	█
11011100	dc	220	▄
11011101	dd	221	▌
11011110	de	222	▐
11011111	df	223	▀

Mathematical Symbols
Binary	Hexadecimal	Decimal	Character
11100000	e0	224	α
11100001	e1	225	ß
11100010	e2	226	Γ
11100011	e3	227	π
11100100	e4	228	Σ
11100101	e5	229	σ
11100110	e6	230	µ
11100111	e7	231	τ
11101000	e8	232	Φ
11101001	e9	233	Θ
11101010	ea	234	Ω
11101011	eb	235	δ
11101100	ec	236	∞
11101101	ed	237	φ
11101110	ee	238	ε
11101111	ef	239	∩
11110000	f0	240	≡
11110001	f1	241	±
11110010	f2	242	≥
11110011	f3	243	≤
11110100	f4	244	⌠
11110101	f5	245	⌡
11110110	f6	246	÷
11110111	f7	247	≈
11111000	f8	248	°
11111001	f9	249	∙
11111010	fa	250	·
11111011	fb	251	√
11111100	fc	252	ⁿ
11111101	fd	253	²
11111110	fe	254	■
11111111	ff	255	(non-breaking space)

Hash functions work on the underlying 1s and 0s of computer data, which is what I'm referring to here with the word "binary".

But what we actually want to send into the hash function is the byte this hexadecimal string represents, which looks like this in binary:

0xab = 10101011
sha256(10101011) = 087d80f7f182dd44f184aa86ca34488853ebcc04f0c60d5294919a466b463831

This is why we usually need to "pack" our hexadecimal strings in to bytes first before hashing. Most programming languages will have functions that allow you to do this. For example:


copied
copied
hex = "ab"
binary = [hex].pack("H*") # H = hex string (highest byte first), * = multiple bytes


copied
copied
$hex = "ab"
$binary = pack("H*", $hex);

You will probably see a bunch of jargon text if you print out these converted binary values directly. This makes sense, because your programming language converts this binary data back to ASCII when printing it out, and this binary data probably refers to a weird characters (code points) in the ASCII table.

In short, remember that hash functions take in binary data as the input, so we need to be specific about the binary data we want to insert.

If you forget to convert your hexadecimal strings to their corresponding bytes beforehand, your programming language will assume you want to send the binary representation of the characters in the string, and this will produce a completely different hash result than expected.

This is by far the most common issue people have when hashing data in Bitcoin for the first time. So if you're not getting the right hash results, this is probably where you're going wrong.

And I should know, I've ~~done~~ do it myself.

All bitcoin data is just a bunch of bytes at the end of the day. We just use their hexadecimal string representation for convenience when sharing and displaying them on websites.

Summary

A hash function is the Swiss Army knife in the programmer's toolbox.

You'll be hashing things frequently when working with Bitcoin, so it's worth getting the hang of them in whatever programming language you're using.

Satoshi very much understood their properties, and utilized them for various purposes when developing Bitcoin. But their most ingenious decision was to leverage the unpredictability of hash functions to create a network-wide lottery, which is what underpins the revolutionary mechanism of mining and blockchain technology.

So it just goes to show that if you understand the fundamental properties of a tool, you can find new and interesting ways to use it.

Lastly, if you want a proper technical definition of a hash function:

Cryptographic hash functions map input strings of arbitrary (or very large) length to short fixed length output strings.

Bart Preneel, The First 30 Years of Cryptographic Hash Functions

But I just think of them as fingerprinting machines for data.