blk.dat
Raw blockchain data files.
The blk.dat files in ~/.bitcoin/blocks/
contain raw block data received by a bitcoin core node.
These blk.dat files basically store “the blockchain”.
How do they work?
Every block that your node receives gets appended to a blk.dat file.
Also, instead of the entire blockchain being stored in one massive file, they are split in to multiple blk*.dat files.
~/.bitcoin/blocks
blk00000.dat
blk00001.dat
blk00002.dat
- …
Your node first adds blocks to blk00000.dat
, then when it fills up it moves on to blk00001.dat
, then blk00002.dat
…and so on.
Example
The data in blk.dat files is stored in binary, and each new block gets appended to the end of the file.
We can look at the genesis block by reading the first 293 bytes of blk00000.dat
:
f9beb4d91d0100000100000000000000000000000000000000000000000000000000000000000000000000003ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a29ab5f49ffff001d1dac2b7c0101000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a01000000434104678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5fac00000000
(See the od command below for getting hexadecimal data from a binary file.)
Structure
The data above can be split in to five pieces:
- The
magic bytes
andsize
allow you to figure out where the data for each block starts and ends. - The
block header
. - The
tx count
(variable integer), followed by thetransaction data
for each one.
Data
[ magic bytes ][ size ][ block header ][ tx count ][ transaction data ]
<- 4 bytes -> <- 4 bytes -> <- 80 bytes -> <- varint -> <- remainder ->
The size
field is what allowed me to figure out that I needed to read 293 bytes to get the whole block.
The size is given as 1d010000
, so get this in human format:
- Swap the endianness to get
0000011d
- Convert to decimal to get
285
So in addition to the initial 8 bytes for the magic-bytes
+ size
, I know the size of the upcoming block data is going to be 285 bytes.
Notes
1. Blocks are not downloaded in order.
If you are parsing the blk.dat files, be aware that blocks are not going to be in order. For example, you may encounter blocks in this order as you run through the file:
A B C E F D
This is because your bitcoin node will download blocks in parallel to download the blockchain as quickly as possible. Your node will download blocks further ahead of the current one as it goes, instead of waiting to receive each block in order.
The maximum distance ahead your node will fetch from (or the “maximum out-of-orderness”) is controlled by BLOCK_DOWNLOAD_WINDOW
in the bitcoin source code.
2. The maximum blk.dat file size is 128MiB (134,217,728 bytes)
This limit is set by MAX_BLOCKFILE_SIZE
Linux Tools
As mentioned, the data inside a blk.dat file is binary, so you’re probably not getting to see much sense if you open one up in a text editor. But no matter, because binary data can be easily converted to hexadecimal, and there are two commands for the job:
1. od
This is a simple one. It dumps out the contents of files in your format of choice.
od -x --endian=big -N 293 -An blk00000.dat
-x
<- show hexadecimal--endian=big
<- display bytes in big endian-N 293
<- number of bytes to read-An
<- do not show file offset
“od” stands for octal dump, but you dump out data in formats other than octal.
2. hexdump
This is similar to od, but it also gives you the option of displaying ascii text from the data (which is handy for looking at messages contained inside transaction data).
$ hexdump -C -s 8 -n 285 blk00000.dat
00000008 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000018 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000028 00 00 00 00 3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e |....;...z{..z.,>|
00000038 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa |gv.a......Q2:...|
00000048 4b 1e 5e 4a 29 ab 5f 49 ff ff 00 1d 1d ac 2b 7c |K.^J}._I......+||
00000058 01 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 |................|
00000068 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000078 00 00 00 00 00 00 ff ff ff ff 4d 04 ff ff 00 1d |..........M.....|
00000088 01 04 45 54 68 65 20 54 69 6d 65 73 20 30 33 2f |..EThe Times 03/|
00000098 4a 61 6e 2f 32 30 30 39 20 43 68 61 6e 63 65 6c |Jan/2009 Chancel|
000000a8 6c 6f 72 20 6f 6e 20 62 72 69 6e 6b 20 6f 66 20 |lor on brink of |
000000b8 73 65 63 6f 6e 64 20 62 61 69 6c 6f 75 74 20 66 |second bailout f|
000000c8 6f 72 20 62 61 6e 6b 73 ff ff ff ff 01 00 f2 05 |or banks........|
000000d8 2a 01 00 00 00 43 41 04 67 8a fd b0 fe 55 48 27 |*....CA.g....UH'|
000000e8 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 |.g..q0..\..(.9..|
000000f8 79 62 e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 |yb...a..I..?L.8.|
00000108 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b 8d 57 |.U......\8M....W|
00000118 8a 4c 70 2b 6b f1 1d 5f ac 00 00 00 00 |.Lp+k.._.....|)
0000125
-C
<- display data in the same byte-order that is used in bitcoin, and also ascii text-s
<- start point (offset in bytes)-n
<- number of bytes to read
Show hexadecimal data only.
You can chain some commands together so that you only get raw hexadecimal data output:
hexdump -C -s 8 -n 285 blk00000.dat | cut -c 11-58 | tr '\n' ' ' | tr -d ' '
cut -c 11-58
<- cuts out anything outside the columns from characters 11 to 58 (on each line)tr '\n' ' '
<- translate new lines in to spacestr -d ' '
<- deletes all spaces