This is a quick guide to explain what bytes are, and how they're used in bitcoin. This will be useful if you're working with transactions and other raw data in bitcoin.
You're going to be doing some low-level programming when working with bitcoin data, so it's good to know what you're looking at.
I'm not a computer scientist either, but bytes are pretty straightforward.
Intro
When using bitcoin day-to-day, you will frequently see raw bytes of data:
Private Key. This is 32 bytes of data representing a very large number.
e.g. 0a7c7d76b42cee7a85d6e30cc38682f5b0d9c41cbbdf7d4c5c0bd81d8a1e93a1 (don't use this, it's just an example)
TXID. This is 32 bytes of data representing a unique identifier for a transaction.
e.g. a1075db55d416d3ca199f55b6084e2115b9345e16c5cf302fc80e9d5fbf5d48d (TXID for the 10,000 BTC pizza transaction)
When working with bitcoin, you'll soon find that all the core data is made up of raw bytes too:
Transaction. This is a bunch of bytes describing the movement of bitcoins from one place to another.
Bitcoin is a computer program, and computers read and communicate using bytes. So if you plan on working with bitcoin on a technical level, it's useful to know what a byte is, how they're displayed, and the types of data they store.
Basics
Before we get on to bytes, we need to know what a "bit" is.
Bit
A bit is the smallest unit of data that a computer can hold. This can either be a 1 or a 0 (i.e. a single transistor being "on" or "off"):
These bits are the building blocks for storing data inside computers.
Ben Eater has some great videos on how computers work.
If you ever get stuck when programming, just remember that it's all just a bunch of ones and zeros at the end of the day.
Byte
A byte is just a group of 8 individual bits:
A byte is a convenient unit of storage for data on a computer. Just as it's sometimes more practical to measure weight in kilograms instead of grams, it's sometimes more practical to measure data in bytes instead of bits.
Bytes are actually the default measurement for data, and most of the data you'll work with in bitcoin is measured in bytes.
A byte is 8 individual bits. So a 32-byteprivate key is 256 bits (32 x 8 = 256).
Representing Bytes
We can represent a byte in two ways:
Using eight binary digits for each bit (e.g. 0b11111111)
This works because if you split a byte in to two 4-bit halves, each of those 4-bit halves can handle 16 different combinations of 1s and 0s. So if you give each of those halves its own hexadecimal character, you can represent a byte using just two characters instead of eight.
This shorter representation is called a hexadecimal byte.
We often display bytes in hexadecimal format to save space, which is much more convenient when you're representing multiple bytes of data:
61 dc 9f f8 b1 54 50 21 29 70 c6 fa 99 73 38 bb 20 5d d4 8f fa ca 3a 05 6b 09 a3 b4 4a 24 4d 76
When I first started working with Bitcoin I made the mistake of interpreting things like private keys as strings of individual letters and numbers. But in reality it's a sequence of bytes, where every two characters is one byte.
It doesn't matter if the hexadecimal characters are uppercase or lowercase. For example, 6b is the same as 6B.
Storing Data in Bytes
So, how can we store useful things like numbers, text, and other things inside these bytes?
It honestly just depends on how you interpret the bits.
A byte can hold different types of data depending on how you interpret that byte. For example, the byte 01100111 could represent the letter "g" or the number 103. It all depends on what kind of data structure you're working with.
VOUT. Each output from a transaction is numbered and stored within 4 bytes of data.
Nonce. The field in the block header that miners use to help with mining is a 4-byte number.
Target. The threshold that miners have to try and get their block hashes below is a 32-byte number.
Locktime. A time or a block height from when a transaction can be mined. Stored as 4 bytes at the end of a transaction.
Time (Block). This is stored as 4 bytes in the block header in Unix Time.
Text
You can use also bytes to store text. You can do this by assigning each character its own byte.
There is no official standard for encoding text inside bitcoin data, but the most commonly used mapping of characters-to-bytes is ASCII:
As you can see, each byte represents a different character. You just need the ASCII table at hand to see which byte corresponds to which character.
Examples
Text is not frequently stored inside Bitcoin data, but it does show up in a few places:
OP_RETURN. You can store arbitrary data inside bitcoin transactions by using OP_RETURN inside the scriptpubkey of an output. The bytes following this opcode are typically ASCII-encoded text (e.g. "charley loves heidi").
Coinbase. Miners typically place their own custom text inside the scriptsig of their coinbase transaction using ASCII (e.g. "/ViaBTC/Mined by mw001/").
Type (Message Header). The type or "command" in each message header is 12 bytes of ASCII-encoded text (e.g. "version", "block", "tx", etc.).
User-Agent (Version Message Payload). The User-Agent field inside the "version" message is ASCII-encoded text (e.g. "Satoshi:24.0.1").
Settings (Bit Field)
Sometimes you can use the underlying bits inside bytes of data to represent multiple on/off settings.
This is known as a bit field, and it's an efficient way to store multiple settings in the smallest space possible.
Examples
Bit fields are used in a handful of places in bitcoin:
Services (Version Message). The 8 byte (64 bit) field inside a "version" message is used by a node to indicate the optional services they offer during the initial connection to another node on the network. For example, this might be whether they are a "full node" (and can supply the entire blockchain to another node), or if they are compatible with the latest upgrades to the software (e.g. Segregated Witness).
Version (Block). This 4 byte (32 bit) field in a block header was previous used to represent a version number, but it's now used as a bit field so that miners can signal readiness for upgrades to the software. For example, a miner can indicate that they are ready for the changes by setting a specific bit to 1 in this field as designated by a particular upgrade. The changes will then come in to effect when enough miners signal readiness over a period of time.
Sequence (Relative Locktime). The 4 byte (32 bit) sequence field in a transaction is interpreted as a bit field when setting the relative locktime for an input.
Just Bytes
Sometimes bytes don't have to represent anything in particular. They can just be useful enough as a unique piece of data.
Examples
The output from a hash function is just a unique series of bytes, and this makes them useful as "fingerprints" for data:
TXID. This is a unique identifier for a transaction. It's used when you want to look up a transaction, or when spending bitcoins created in a previous transaction (i.e. as an input to a transaction).
Block Hash. This is a unique identifier for a block. It's used when you want to look up a block, or when referencing a previous block when building on top of the longest chain (i.e. in the block header).
Public Key Hash. This is a short but unique fingerprint for a public key. It's used as a way to shorten the public key before converting to an address.
And sometimes we might just use a random-looking set of bytes with no meaning at all:
Magic Bytes. This is a specifically chosen set of 4 bytes (f9beb4d9) that is used as a marker when receiving streams of bytes from other nodes on the network. This helps us to figure out when one message ends and a new one starts.
So you don't have to interpret bytes as anything at all. Having a unique set of bytes to use as a "fingerprint" for data can be useful enough on its own.
A block hash actually gets interpreted as both a unique identifier and a number. It's usually just a unique identifier for the block, but it also gets interpreted as a number during the mining process to see if the block hash is below the current target value.
Custom Data
Ultimately you can use bytes to represent anything you like. It doesn't have to just be numbers and text (although they're the most common). It all depends on how you choose to interpret the combinations of 1s and 0s inside those bytes.
Examples
These are some encodings that are unique to bitcoin:
Script. A byte inside a scriptpubkey or scriptsig can refer to a specific operation known as an "OP_CODE". Similar to how each byte can map to a specific character in the ASCII table, each byte inside a script can map to a specific operation in the table of OP_CODES.
Compact Size. This is a compact data structure used throughout bitcoin data to indicate a variable number of upcoming bytes.
Bits (Block Header). The "bits" field in the block header is used to store the current target in a compact form.
Working with Bytes
Being able to work with raw bytes in bitcoin is especially important when it comes to networking or when you're hashing data.
The good news is you can work with raw bytes of data in any decent programming language. This varies between languages, but it usually involves writing out each individual byte value in some sort of special string or array, like so:
"\xF9\xBE\xB4\xD9"
{0xF9, 0xBE, 0xB4, 0xD9}
But this kind of format isn't easy to pass around, so you'll often find yourself converting these bytes in to hexadecimal strings for display purposes:
"f9beb4d9"
This is how I typically display bytes on this website, and how you'll usually see bytes being displayed on blockchain explorers.
Sometimes you'll want to get the integer value that these bytes represent too:
4190024921
So when working with Bitcoin data, you want to be able to convert back and forth between actual bytes, hexadecimal strings, and integers in your programming language of choice.
Here are some quick examples on how to do this in a few common programming languages:
# Work with bytes directly using hexadecimal characters to represent each byte.
bytes = "\xF9\xBE\xB4\xD9"#=> (jargon - tries to display character encoding for each byte value)# Bytes -> Hexadecimal String
hex_string = bytes.unpack("H*")[0] #=> "f9beb4d9"# Hexadecimal String -> Bytes
bytes = [hex_string].pack("H*") #=> (jargon - tries to display a character encoding for each byte value)# Hexadecimal String -> Integer
integer = hex_string.to_i(16) #=> 4190024921
Ruby can be a bit awkward when displaying bytes. If you try to print out bytes directly, Ruby will try to display them using each byte's character encoding (instead of their hexadecimal representation), so you need to convert to hexadecimal to display them when debugging.
The "pack" and "unpack" functions are the most useful when it comes to working with raw bytes of data in languages like Ruby and Python, so it's worth getting to know them.
# NOTE: Python 3.5+# Create bytes
raw_bytes = b'\xf9\xbe\xb4\xd9'# Convert bytes to hex string
hex_string = b'\xf9\xbe\xb4\xd9'.hex()
print(hex_string) #=> f9beb4d9# Convert hex string to bytes
raw_bytes = bytes.fromhex('f9beb4d9')
print(raw_bytes) #=> b'\xf9\xbe\xb4\xd9'# Convert bytes to integer
integer = int.from_bytes(b'\xf9\xbe\xb4\xd9', byteorder="big") # second argument is endianness
print(integer) #=> 4190024921
See little-endian for details about endianness in bitcoin.
package main
import"fmt"import"encoding/hex"// converting between byte array and hexadecimal stringimport"math/big"// converting between byte array and integerfuncmain() {
// Create a byte array
bytes := []byte{0xF9, 0xBE, 0xB4, 0xD9}
fmt.Println(bytes) //=> [249 190 180 217]// Convert byte array to hexadecimal string
hex_string := hex.EncodeToString(bytes)
fmt.Println(hex_string) //=> f9beb4d9// Convert hexadecimal string to byte array
byte_array, _ := hex.DecodeString(hex_string)
fmt.Println(byte_array) //=> [249 190 180 217]// Convert byte array to integer
integer := new(big.Int).SetBytes(byte_array)
fmt.Println(integer) //=> 4190024921// Convert integer to byte array
byte_array_from_integer := integer.Bytes()
fmt.Println(byte_array_from_integer) //=> [249 190 180 217]
}
To save on space, we usually represent each byte using 2 hexadecimal characters instead of using 8 individual bits:
byte (hexadecimal):
┌─┬─┐
│6│B│
└─┴─┘
Bytes are most commonly used in bitcoin to represent numbers (e.g. private key and output amounts), but they're often used as unique fingerprints for larger pieces of data (e.g. TXIDs and Block Hashes).
But at the end of the day, the combination of 1s and 0s inside bytes can be used to store any kind of data.