blk.dat

Raw blockchain data files

The blk.dat files in the ~/.bitcoin/blocks/ directory contain the raw block data received by your Bitcoin Core node.

These blk.dat files basically store the entire blockchain.

Location

Where is the blockchain stored on your computer?

The location of the raw blockchain files on your disk depends on what operating system you're using. These are the default locations:

You can change the location of the block data directory by setting the datadir=<dir> option in the bitcoin.conf configuration file.

Filenames

How are the blockchain files organized?

Every block that your node receives gets appended to a blk.dat file. But instead of the entire blockchain being stored in one massive file, they are split in to multiple blk*.dat files.

Your node first adds blocks to blk00000.dat, then when it fills up it moves on to blk00001.dat, then blk00002.dat..., and so on. If you're on Linux, you can navigate to the data directory and list all the raw block files with:

$ cd ~/.bitcoin/blocks/
$ ls blk*

blk00000.dat
blk00001.dat
blk00002.dat
blk00003.dat
blk00004.dat
blk00005.dat
blk00006.dat
...

The maximum blk.dat file size is 128 MiB (134,217,728 bytes). This limit is set by MAX_BLOCKFILE_SIZE.

Example

What does a raw block look like?

The data in blk.dat files is stored in binary, which is basically a bunch of 1s and 0s and not human-readable text.

Nonetheless, we can look at the genesis block by reading the first 293 bytes of blk00000.dat. I've split up the individual fields so you can see them more clearly:

f9beb4d9 1d010000 01000000 0000000000000000000000000000000000000000000000000000000000000000 3ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a 29ab5f49 ffff001d 1dac2b7c 01 01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a01000000434104678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5fac00000000

See the od command below for displaying the hex bytes from a binary file.

Structure

What is the structure of a raw block?

Diagram showing structure of the raw block data inside the blk.dat files.

The data above can be split in to five parts:

  1. The magic bytes (4 bytes) is a message delimiter indicating the start of a block.
  2. The size (4 bytes) indicates the size of the upcoming block in bytes.
  3. The block header (80 bytes) is the summary of the block data.
  4. The tx count (compact size) indicates how many transactions are in the block.
  5. The transaction data (variable) is all of the transactions in the block concatenated one after the other.

The size field is what allowed me to figure out that I needed to read 293 bytes to get the whole block in the example above. The size of the block is indicated as 1d010000, so to get this in human format:

  1. Convert 1d010000 from little-endian to big-endian to get 0000011d
  2. Convert 0000011d from hexadecimal to decimal to get 285

So the actual block itself is only 285 bytes. But then there is an extra 8 bytes at the start for the magic-bytes + size, so I needed to read 293 bytes from the start of the raw blockchain file to get the full block of data.

Linux Tools

How can you read raw blockchain data?

As mentioned, the data inside a blk.dat file is binary, so you're probably not getting to see anything useful if you open one up in a regular text editor. But no matter, because binary data can be easily displayed as hexadecimal bytes, and there are a few commands that can help:

1. od

This is a simple one. It dumps out the contents of files in your format of choice.

$ od -x --endian=big -N 293 -An blk00000.dat

# -x           <- show hexadecimal
# --endian=big <- display bytes in big endian
# -N 293       <- number of bytes to read
# -An          <- do not show file offset

"od" stands for octal dump, but you can dump out data in other formats than just octal.

2. hexdump

This is similar to od, but it also gives you the option of displaying ASCII text from the data (which is also handy for looking at messages contained inside transaction data).

$ hexdump -C -s 8 -n 285 blk00000.dat

00000008  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000018  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000028  00 00 00 00 3b a3 ed fd  7a 7b 12 b2 7a c7 2c 3e  |....;...z{..z.,>|
00000038  67 76 8f 61 7f c8 1b c3  88 8a 51 32 3a 9f b8 aa  |gv.a......Q2:...|
00000048  4b 1e 5e 4a 29 ab 5f 49  ff ff 00 1d 1d ac 2b 7c  |K.^J}._I......+||
00000058  01 01 00 00 00 01 00 00  00 00 00 00 00 00 00 00  |................|
00000068  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000078  00 00 00 00 00 00 ff ff  ff ff 4d 04 ff ff 00 1d  |..........M.....|
00000088  01 04 45 54 68 65 20 54  69 6d 65 73 20 30 33 2f  |..EThe Times 03/|
00000098  4a 61 6e 2f 32 30 30 39  20 43 68 61 6e 63 65 6c  |Jan/2009 Chancel|
000000a8  6c 6f 72 20 6f 6e 20 62  72 69 6e 6b 20 6f 66 20  |lor on brink of |
000000b8  73 65 63 6f 6e 64 20 62  61 69 6c 6f 75 74 20 66  |second bailout f|
000000c8  6f 72 20 62 61 6e 6b 73  ff ff ff ff 01 00 f2 05  |or banks........|
000000d8  2a 01 00 00 00 43 41 04  67 8a fd b0 fe 55 48 27  |*....CA.g....UH'|
000000e8  19 67 f1 a6 71 30 b7 10  5c d6 a8 28 e0 39 09 a6  |.g..q0..\..(.9..|
000000f8  79 62 e0 ea 1f 61 de b6  49 f6 bc 3f 4c ef 38 c4  |yb...a..I..?L.8.|
00000108  f3 55 04 e5 1e c1 12 de  5c 38 4d f7 ba 0b 8d 57  |.U......\8M....W|
00000118  8a 4c 70 2b 6b f1 1d 5f  ac 00 00 00 00           |.Lp+k.._.....|)
0000125

# -C <- display data in the same byte-order that is used in bitcoin, and also ascii text
# -s <- start point (offset in bytes)
# -n <- number of bytes to read

This is a popular way to display the genesis block, and you'll see it floating around the Internet in various places.

Anyway, you can chain some commands together so that you just get the raw hexadecimal bytes without any formatting if you prefer:

$ hexdump -C -s 8 -n 285 blk00000.dat | cut -c 11-58 | tr '\n' ' ' | tr -d ' '

0100000000000000000000000000000000000000000000000000000000000000000000003ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a29ab5f49ffff001d1dac2b7c0101000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a01000000434104678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5fac00000000%

# cut -c 11-58 <- cuts out anything outside the columns from characters 11 to 58 (on each line)
# tr '\n' ' ' <- translate new lines in to spaces
# tr -d ' ' <- deletes all spaces

But if you're going to go to the effort of doing that, you might as well extract raw block data directly from Bitcoin Core by using:

$ bitcoin-cli getblock <hash> 0

3. bitcoin-iterate

bitcoin-iterate is an excellent tool for extracting data from raw blockchain files. It's surprisingly fast too. Here are some simple examples:

# Usage
bitcoin-iterate -h

# return the block headers for the first 100 blocks
bitcoin-iterate -q --block='%bH' --end=100 > headers.txt

# return the all raw transactions in block 123,456
bitcoin-iterate -q --tx='%tX' --start=123456 --end=123456 > transactions.txt

# return every single scriptpubkey in the blockchain along with the txid for the transaction they were included in
bitcoin-iterate -q --output='%th %os' > scriptpubkeys.txt

I use it all the time to look for interesting blocks and transactions in the blockchain.

Notes

Block Order

If you are parsing the blk.dat files with your own script, be aware that blocks are not going to be in order. For example, you may encounter blocks in this order as you run through the file:

A B C E F D G

This is because your bitcoin node will download blocks in parallel so that it can download the blockchain as quickly as possible. So instead of of having to wait to receive each block in order, your node will download blocks further ahead of the current one as it goes.

The maximum distance ahead your node will fetch from (or the "maximum out-of-orderness") is controlled by BLOCK_DOWNLOAD_WINDOW in the bitcoin source code.

Resources