Redis RDB File Format
Redis’ RDB file is a binary representation of the in-memory store. This binary file is sufficient to completely restore Redis’ state.
The RDB file format is optimized for fast read and writes. Where possible LZF compression is used to reduce the file size. In general, objects are prefixed with their lengths, so before reading the object you know exactly how much memory to allocate.
Optimizing for fast read/writes means the on-disk format should be as close as possible to the in-memory representation. This is the approach taken by the RDB file. As a consequence, you cannot parse the RDB file without some understanding of Redis’ in-memory representation of data structures.
High Level Algorithm to parse RDB
At a high level, the RDB file has the following structure
----------------------------#
52 45 44 49 53 # Magic String "REDIS"
30 30 30 33 # RDB Version Number as ASCII string. "0003" = 3
----------------------------
FA # Auxiliary field
$string-encoded-key # May contain arbitrary metadata
$string-encoded-value # such as Redis version, creation time, used memory, ...
----------------------------
FE 00 # Indicates database selector. db number = 00
FB # Indicates a resizedb field
$length-encoded-int # Size of the corresponding hash table
$length-encoded-int # Size of the corresponding expire hash table
----------------------------# Key-Value pair starts
FD $unsigned-int # "expiry time in seconds", followed by 4 byte unsigned int
$value-type # 1 byte flag indicating the type of value
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value, encoding depends on $value-type
----------------------------
FC $unsigned long # "expiry time in ms", followed by 8 byte unsigned long
$value-type # 1 byte flag indicating the type of value
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value, encoding depends on $value-type
----------------------------
$value-type # key-value pair without expiry
$string-encoded-key
$encoded-value
----------------------------
FE $length-encoding # Previous db ends, next db starts.
----------------------------
... # Additional key-value pairs, databases, ...
FF ## End of RDB file indicator
8-byte-checksum ## CRC64 checksum of the entire file.
Magic Number
The file starts off with the magic string “REDIS”. This is a quick sanity check to know we are dealing with a redis rdb file.
52 45 44 49 53 # "REDIS"
RDB Version Number
The next 4 bytes store the version number of the rdb format. The 4 bytes are interpreted as ASCII characters and then converted to an integer using string to integer conversion.
30 30 30 33 # "0003" => Version 3
Op Codes
Each part after the initial header is introduced by a special op code. The available op codes are:
Byte | Name | Description |
---|---|---|
0xFF | EOF | End of the RDB file |
0xFE | SELECTDB | Database Selector |
0xFD | EXPIRETIME | Expire time in seconds, see Key Expiry Timestamp |
0xFC | EXPIRETIMEMS | Expire time in milliseconds, see Key Expiry Timestamp |
0xFB | RESIZEDB | Hash table sizes for the main keyspace and expires, see Resizedb information |
0xFA | AUX | Auxiliary fields. Arbitrary key-value settings, see Auxiliary fields |
Database Selector
A Redis instance can have multiple databases.
A single byte 0xFE
flags the start of the database selector. After this byte, a variable length field indicates the database number. See the section Length Encoding to understand how to read this database number.
Resizedb information
This op code was introduced in RDB version 7.
It encodes two values to speed up RDB loading by avoiding additional resizes and rehashing. The op code is followed by two length-encoded integers indicating:
- Database hash table size
- Expiry hash table size
Auxiliary fields
This op code was introduced in RDB version 7.
The op code is followed by two Redis Strings, representing the key and value of a setting. Unknown fields should be ignored by a parser.
Currently the following settings are implemented:
redis-ver
: The Redis Version that wrote the RDBredis-bits
: Bit architecture of the system that wrote the RDB, either 32 or 64ctime
: Creation time of the RDBused-mem
: Used memory of the instance that wrote the RDB
Key Value Pairs
After the database selector, the file contains a sequence of key value pairs.
Each key value pair has 4 parts:
- Key Expiry Timestamp. This is optional.
- 1 byte flag indicating the value type.
- The key, encoded as a Redis String. See String Encoding.
- The value, encoded according to the value type. See Value Encoding.
Key Expiry Timestamp
This section starts with a one byte flag. This flag is either:
0xFD
: The following expire value is specified in seconds. The following 4 bytes represent the Unix timestamp as an unsigned integer.0xFC
: The following expire value is specified in milliseconds. The following 8 bytes represent the Unix timestamp as an unsigned long.
During the import process, keys that have expired must be discarded.
Value Type
A one byte flag indicates encoding used to save the Value.
0
= String Encoding1
= List Encoding2
= Set Encoding3
= Sorted Set Encoding4
= Hash Encoding9
= Zipmap Encoding10
= Ziplist Encoding11
= Intset Encoding12
= Sorted Set in Ziplist Encoding13
= Hashmap in Ziplist Encoding (Introduced in RDB version 4)14
= List in Quicklist encoding (Introduced in RDB version 7)
Key
The key is simply encoded as a Redis string. See the section String Encoding to learn how the key is encoded.
Value
The value is parsed according to the previously read Value Type
Encodings
Length Encoding
Length encoding is used to store the length of the next object in the stream. Length encoding is a variable byte encoding designed to use as few bytes as possible.
This is how length encoding works : Read one byte from the stream, compare the two most significant bits:
Bits | How to parse |
---|---|
00 |
The next 6 bits represent the length |
01 |
Read one additional byte. The combined 14 bits represent the length |
10 |
Discard the remaining 6 bits. The next 4 bytes from the stream represent the length |
11 |
The next object is encoded in a special format. The remaining 6 bits indicate the format. May be used to store numbers or Strings, see String Encoding |
As a result of this encoding:
- Numbers up to and including 63 can be stored in 1 byte
- Numbers up to and including 16383 can be stored in 2 bytes
- Numbers up to 2^32 -1 can be stored in 4 bytes
String Encoding
Redis Strings are binary safe - which means you can store anything in them. They do not have any special end-of-string token. It is best to think of Redis Strings as a byte array.
There are three types of Strings in Redis:
- Length prefixed strings
- An 8, 16 or 32 bit integer
- A LZF compressed string
Length Prefixed String
Length prefixed strings are quite simple. The length of the string in bytes is first encoded using Length Encoding. After this, the raw bytes of the string are stored.
Integers as String
First read the section Length Encoding, specifically the part when the first two bits are 11
. In this case, the remaining 6 bits are read.
If the value of those 6 bits is:
0
indicates that an 8 bit integer follows1
indicates that a 16 bit integer follows2
indicates that a 32 bit integer follows
Compressed Strings
First read the section Length Encoding, specifically the part when the first two bits are 11
. In this case, the remaining 6 bits are read. If the value of those 6 bits is 3, it indicates that a compressed string follows.
The compressed string is read as follows:
- The compressed length
clen
is read from the stream using Length Encoding - The uncompressed length is read from the stream using Length Encoding
- The next
clen
bytes are read from the stream - Finally, these bytes are decompressed using LZF algorithm
List Encoding
A Redis list is represented as a sequence of strings.
- First, the size of the list
size
is read from the stream using Length Encoding - Next,
size
strings are read from the stream using String Encoding - The list is then re-constructed using these Strings
Set Encoding
Sets are encoded exactly like lists.
Sorted Set Encoding
- First, the size of the sorted set
size
is read from the stream using Length Encoding - Next,
size
pairs of the value and its score are read from the stream. - The value is string encoded
- The next byte specifies the length of the encoded score (as an unsigned integer). This byte has 3 special meanings:
253
: not a number. Don’t read additional bytes254
: positive infinity. Don’t read additional bytes255
: negative infinity. Don’t read additional bytes
- Read that many bytes from the stream, representing the score as an ASCII-encoded float. Use String-to-float conversion to get the actual double value.
Note: Depending on the value you may loose precision. Redis saves the score as a double.
Note: It’s not guaranteed that the set is sorted already.
Example:
04
01 63 12 34 2E 30 31 39 39 39 39 39 39 39 39 39 39 39 39 39 36
01 64 FE
01 61 12 33 2E 31 38 39 39 39 39 39 39 39 39 39 39 39 39 39 39
01 65 FF
- First read the size of the sorted set:
04
=4
(decimal) - Next, read the first member, string encoded. It has a length of
01
=1
(decimal). Read one byte:63
=c
(ASCII). - Then read the next byte:
12
=18
(decimal). This is the lenght of the following ASCII-encoded score. - Read 18 bytes as ASCII values:
4.0199999999999996
. This could be parsed into a double value if necessary. - Read one byte:
01
=1
, the length of the next member. Read that byte:64
=d
(ASCII) - Read one more byte:
FE
=254
(decimal). This means the score is positive infinity. - Read the next byte:
01
. Again, the length of the next member. Read that byte:61
=a
. - Read the next byte:
12
=18
(decimal). Read the next 18 bytes and interpret them as ASCII:""3.1899999999999999
- Read one byte:
01
. Again, the length of the next member. Read that byte:65
=e
. - Read one more byte:
FF
=255
(decimal). This means the score is negative infinity.
The resulting sorted set will be:
{ "e" => "-inf", "a" => "3.189999999999999", "c" => "4.0199999999999996", "d" => "+inf" }
Hash Encoding
- First, the
size
of the hash is read from the stream using Length Encoding - Next
2 * size
strings are read from the stream using String Encoding (alternate strings are key and values)
Example:
2 us washington india delhi
represents the map
{"us" => "washington", "india" => "delhi"}
Zipmap Encoding
A Zipmap is a hashmap that has been serialized to a string. In essence, the key value pairs are stored sequentially. Looking up a key in this structure is O(N). This structure is used instead of a dictionary when the number of key value pairs are small.
To parse a zipmap, first a string is read from the stream using String Encoding. The contents of this string represent the zipmap.
The structure of a zipmap within this string is as follows:
<zmlen><len>"foo"<len><free>"bar"<len>"hello"<len><free>"world"<zmend>
zmlen
: 1 byte that holds the size of the zip map. If it is greater than or equal to 254, value is not used. You will have to iterate the entire zip map to find the length.len
: the length of the following string, which can be either a key or a value. This length is stored in either 1 byte or 5 bytes (yes, it differs from Length Encoding described above). If the first byte is between 0 and 252, that is the length of the zipmap. If the first byte is 253, then the next 4 bytes read as an unsigned integer represent the length of the zipmap. 254 and 255 are invalid values for this field.free
: This is always 1 byte, and indicates the number of free bytes after the value. For example, if the value of a key is “America” and its get updated to “USA”, 4 free bytes will be available.zmend
: Always255
. Indicates the end of the zipmap.
Example:
18 02 06 4D 4B 44 31 47 36 01 00 32 05 59 4E 4E 58 4b 04 00 46 37 54 49 FF ..
- Start by decoding this using String Encoding. You will notice that
0x18
(24
in decimal) is the length of the string. Accordingly, we will read the next 24 bytes i.e. up toFF
- Now, we are parsing the string starting at
02 06...
using the Zipmap Encoding 02
is the number of entries in the hashmap.06
is the length of the next string. Since this is less than 254, we don’t have to read any additional bytes- We read the next 6 bytes i.e.
4d 4b 44 31 47 36
to get the key “MKD1G6” 01
is the length of the next string, which would be the value00
is the number of free bytes- We read the next 1 byte, which is
0x32
. Thus, we get our value"2"
- In this case, the free bytes is
0
, so we don’t skip anything 05
is the length of the next string, in this case a key.- We read the next 5 bytes
59 4e 4e 58 4b
, to get the key"YNNXK"
04
is the length of the next string, which is a value00
is the number of free bytes after the value- We read the next 4 bytes i.e.
46 37 54 49
to get the value"F7TI"
- Finally, we encounter
FF
, which indicates the end of this zip map - Thus, this zip map represents the hash
{"MKD1G6" => "2", "YNNXK" => "F7TI"}
Ziplist Encoding
A Ziplist is a list that has been serialized to a string. In essence, the elements of the list are stored sequentially along with flags and offsets to allow efficient traversal of the list in both directions.
To parse a ziplist, first a string is read from the stream using String Encoding. The contents of this string represent the ziplist.
The structure of a ziplist within this string is as follows:
<zlbytes><zltail><zllen><entry><entry><zlend>
zlbytes
: a 4 byte unsigned integer representing the total size in bytes of the ziplist. The 4 bytes are in little endian format - the least significant bit comes first.zltail
: a 4 byte unsigned integer in little endian format. It represents the offset to the tail (i.e. last) entry in the ziplistzllen
: This is a 2 byte unsigned integer in little endian format. It represents the number of entries in this ziplistentry
: An entry represents an element in the ziplist. Details belowzlend
: Always255
. It represents the end of the ziplist.
Each entry in the ziplist has the following format :
<length-prev-entry><special-flag><raw-bytes-of-entry>
length-prev-entry
: stores the length of the previous entry, or 0 if this is the first entry. This allows easy traversal of the list in the reverse direction. This length is stored in either 1 byte or in 5 bytes. If the first byte is less than or equal to 253, it is considered as the length. If the first byte is 254, then the next 4 bytes are used to store the length. The 4 bytes are read as an unsigned integer.
special-flag
: This flag indicates whether the entry is a string or an integer. It also indicates the length of the string, or the size of the integer. The various encodings of this flag are shown below:
Bytes | Length | Meaning |
---|---|---|
00pppppp |
1 byte | String value with length less than or equal to 63 bytes (6 bits) |
01pppppp|qqqqqqqq |
2 bytes | String value with length less than or equal to 16383 bytes (14 bits) |
10______|<4 byte> |
5 bytes | Next 4 byte contain an unsigned int. String value with length greater than or equal to 16384 bytes |
1100____ |
3 bytes | Integer encoded as 16 bit signed (2 bytes) |
1101____ |
5 bytes | Integer encoded as 32 bit signed (4 bytes) |
1110____ |
9 bytes | Integer encoded as 64 bit signed (8 bytes) |
1111____ |
4 bytes | Integer encoded as 24 bit signed (3 bytes) |
raw-byte
: After the special flag, the raw bytes of entry follow. The number of bytes was previously determined as part of the special flag.
Example:
23 23 00 00 00 1E 00 00 00 04 00 00 E0 FF FF FF FF FF
FF FF 7F 0A D0 FF FF 00 00 06 C0 FC 3F 04 C0 3F 00 FF ...
- Start by decoding this using String Encoding.
23
is the length of the string (35 in decimal), therefore we will read the next 35 bytes tillff
- Now, we are parsing the string starting at
23 00 00 ...
using Ziplist encoding - The first 4 bytes
23 00 00 00
represent the total length in bytes of this ziplist. Notice that this is in little endian format - The next 4 bytes
1e 00 00 00
represent the offset to the tail entry.1E
= 30 (in decimal), and this is a 0 based offset. 0th position =23
, 1st position =00
and so on. It follows that the last entry starts at04 c0 3f 00 ..
- The next 2 bytes
04 00
represent the number of entries in this list as a big endian 16 bit integer.04 00 = 4
in decimal. - From now on, we start reading the entries
00
represents the length of previous entry.0
indicates this is the first entry.E0
is the special flag. Since it starts with the bit pattern1110____
, we read the next 8 bytes as an integer. This is the first entry of the list.- We now start the second entry
0A
is the length of the previous entry.0A
=10
in decimal. 10 bytes = 1 byte for prev. length + 1 byte for special flag + 8 bytes for integer.D0
is the special flag. Since it starts with the bit pattern1101____
, we read the next 4 bytes as an integer. This is the second entry of the list- We now start the third entry
06
is the length of previous entry. 6 bytes = 1 byte for prev. length + 1 byte for special flag + 4 bytes for integerC0
is the special flag. Since it starts with the bit pattern1100____
, we read the next 2 bytes as an integer. This is the third entry of the list- We now start the last entry
04
is length of previous entryC0
indicates a 2 byte number- We read the next 2 bytes, which gives us our fourth entry
- Finally, we encounter
FF
, which tells us we have consumed all elements in this ziplist. - Thus, this ziplist stores the values
[0x7fffffffffffffff, 65535, 16380, 63]
Intset Encoding
An Intset is a binary search tree of integers. The binary tree is implemented in an array of integers. An intset is used when all the elements of the set are integers. An Intset has support for up to 64 bit integers. As an optimization, if the integers can be represented in fewer bytes, the array of integers will be constructed from 16 bit or 32 bit integers. When a new element is inserted, the implementation takes care to upgrade if necessary.
Since an Intset is a binary search tree, the numbers in this set will always be sorted.
An Intset has an external interface of a Set.
To parse an Intset, first a string is read from thee stream using String Encoding. The contents of this string represent the Intset.
Within this string, the Intset has a very simple layout :
<encoding><length-of-contents><contents>
encoding
: is a 32 bit unsigned integer. It has 3 possible values - 2, 4 or 8. It indicates the size in bytes of each integer stored in contents. And yes, this is wasteful - we could have stored the same information in 2 bits.length-of-contents
: is a 32 bit unsigned integer, and indicates the length of the contents arraycontents
: is an array of $length-of-contents bytes. It contains the binary tree of integers
Example
14 04 00 00 00 03 00 00 00 FC FF 00 00 FD FF 00 00 FE FF 00 00 ...
- Start by decoding this using “String Encoding”.
14
is the length of the string, therefore we will read the next 20 bytes till00
- Now, we start interpreting the string starting at
04 00 00 ...
- The first 4 bytes
04 00 00 00
is the encoding. Since this evaluates to 4, we know we are dealing with 32 bit integers - The next 4 bytes
03 00 00 00
is the length of contents. So, we know we are dealing with 3 integers, each 4 byte long - From now on, we read in groups of 4 bytes, and convert it into a unsigned integer
- Thus, our intset looks like
[0x0000FFFC, 0x0000FFFD, 0x0000FFFE]
. Notice that the integers are in little endian format i.e. least significant bit came first.
Sorted Set in Ziplist Encoding
A sorted set in ziplist encoding is stored just like the Ziplist described above. Each element in the sorted set is followed by its score in the ziplist.
Example
['Manchester City', 1, 'Manchester United', 2, 'Tottenham', 3]
As you see, the scores follow each element.
Hashmap in Ziplist Encoding
In this, key=value pairs of a hashmap are stored as successive entries in a ziplist.
Note: This was introduced in RDB version 4. This deprecates the zipmap encoding that was used in earlier versions.
Example
{"us" => "washington", "india" => "delhi"}
is stored in a ziplist as :
["us", "washington", "india", "delhi"]
Quicklist Encoding
RDB Version 7 introduced a new variant of list encoding: Quicklist.
Quicklist is a linked list of ziplists. Quicklist combines the memory efficiency of small ziplists with the extensibility of a linked list allowing us to create space-efficient lists of any length.
To parse a quicklist, first a string is read from the stream using String Encoding. The contents of this string represent the ziplist.
The structure of a quicklist within this string is as follows:
<len><ziplist><ziplist>...
len
: This is the number of nodes of the linked list, length-encodedziplist
: A string that wraps a ziplist, parse it with Ziplist encoding
A complete list needs to be constructed from all elements of all ziplists.
Example:
01 1F 1F 00 00 00 17 00 00 00 02 00 00 0B 6F 6E 65 2D 65 6C 65 6D 65 6E 74 0D 05 65 6C 65 6D 32 FF
- Start by reading the first byte in order to get the list length in length-encoding. In this case it is
01
, so the list contains just one node. - What follows is a ziplist containing all elements for this node.
- Proceed by reading the ziplist as specified in Ziplist Encoding.
- Read the blob, string-encoded
- For this read the length,
0x1F
, so 31 bytes up to and includingFF
- Read these 31 bytes.
- Read the number of bytes of the ziplist as a 32 bit unsigned integer:
1F 00 00 00
=31
(decimal). - Read the tail offset as a 32 bit unsigned integer:
17 00 00 00
=23
(decimal). It’s zero-based, so this points to0D 05 65 6C ...
- Read the length of the ziplist as a 16 bit unsigned integer:
02 00
=2
(decimal). - Now read two ziplist entries as follows:
- Read the next byte:
00
, as this is the first item and has no predecessors item. - Read the next byte:
0B
=11
(decimal). As this starts with the bit pattern00
it is treated as the length. - Read 11 bytes from the stream:
6F 6E 65 2D 65 6C 65 6D 65 6E 74
="one-element"
(ASCII) - This is the first element.
- Proceed with the second item.
- Read one byte:
0D
=13
(decimal). This is the length of the previous item (length of 11 + flag + prev-length) - Read the next byte:
05
. This is the size of the following string. - Read the next 5 bytes:
65 6C 65 6D 32
=elem2
. This is the second list element. - Read the next byte:
FF
. This marks the end of the ziplist (as we read 2 elements as determined before) - Thus, the Quicklist stores the list
["one-element", "elem2"]
.
CRC64 Checksum
Starting with RDB Version 5, an 8 byte CRC64 checksum is added to the end of the file. It is possible to disable this checksum via a parameter in redis.conf. When the checksum is disabled, this field will have zeroes.