RLP编解码规则

Mar 10, 2021

递归式长度前缀(Recursive Length Prefix)是以太坊用来对数据进行序列化和反序列的编码规则，这里是参考以太网维基[^1]的一篇小小笔记。

编码

适用于RLP编码规则的数据结构

字符串，'doge'，aka. 字节数组
字符串的嵌套列表 ['doge', 'is', ['a', 'e-god'], ['backwards', ['V']]]

规则

三条针对字符串的编码规则

对于单字节，且值在[0x00, 0x7f]范围内(ASCII 码表)，自身即是RLP编码
- ["a"] 编码为 [0x61]
- ["0"] 编码为 [0x00]
- ["15"] 编码为 [0x0f]
对于字符串，且其长度小于55字节，RLP编码由[前缀 , 字符串] 构成，prefix = 0x80 + len(s)。因此，前缀的范围在 [0x80, 0xb7] 0xb7 = 0x80 + 55
- [“cat”] 编码为[0x83, ‘c’, ‘a’, ’t’]
- [“1024”] 编码为 [0x82, 0x04, 0x00]
对于长度大于或等于55字节的字符串，RLP编码由[前缀, 字符串长度, 字符串] 构成，prefix = 0xb7 + len(len(s)) (字符串长度值的二进制长度)。前缀的范围在[0xb8, 0xbf] 0xbf = 0xb7 + 8，因为字符串长度不得超过 2**64字节，长度值的二进制不超过64位，即8字节。
- [“Lorem ipsum dolor sit amet, consectetur adipisicing elit”]
  - “Lorem ipsum dolor sit amet, consectetur adipisicing elit” 长度为0x38，长度值0x38仅有一个字节
  - 前缀prefix = 0xb7+1 = 0xb8
  - 编码结果为[ 0xb8, 0x38, ‘L’, ‘o’, ‘r’, ’e’, ’m’, ’ ‘, … , ’e’, ’l’, ‘i’, ’t’ ]

两条针对嵌套列表的编码规则

递归式编码列表：先从最内层列表开始。如果列表长度小于55字节，其RLP编码由[前缀 , 字符串] 构成，prefix = 0xc0 + len(s)。因此前缀的范围在 [0xc0, 0xf7] 0xf7 = 0xc0 + 55；对于外层列表，其RLP编码由[前缀，子列表1 RLP，子列表2 RLP，…]构成，prefix = 0xc0 + len(sublist1) + len(sublist2) + …
- [“cat”, “dog”]
  - "cat"长度为0x03，子列表前缀0x80 + 0x03 = 0x83
  - "dog"长度为0x03，子列表前缀0x80 + 0x03 = 0x83
  - 前缀prefix = 0xc0 + 0x04 + 0x04 = 0x08
  - 编码结果为[0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g']
对于长度大于55字节的列表，RLP编码由[前缀, 列表长度, 子列表1RLP, 子列表2RLP …]，前缀 prefix = 0xf7 + len(len(sublist1)) + len(len(sublist2)) + …。前缀的范围在[0xf8 0xff] 0xff = 0xf7 + 8，因为字符串长度不得超过 2**64字节，二进制长度不得超过64位，即8字节。
- [“The length of this sentence is more than 55 bytes, “, “I know it because I pre-designed it”]
  - "The length of this sentence is more than 55 bytes, " 长度为 0x33，根据规则2得到 0x80 + 0x33 = 0xb3
  - "I know it because I pre-designed it" 长度为 0x23，根据规则2得到 0x80 + 0x23 = 0xa3
  - 列表总长度 0x33 + 0x23 + 0x02 = 0x58，长度值的二进制宽度为1字节
  - 前缀 prefix = 0xf7 + 1 = 0xf8
  - 编码结果为[0xf8, 0x58, 0xb3, 'T', 'h', ..., 'e', 's', ',', ' ', 0xa3, 'I', ' ', 'k', ..., 'i', 't']

代码

def rlp_encode(input):
    if isinstance(input,str):
        if len(input) == 1 and ord(input) < 0x80: return input
        else: return encode_length(len(input), 0x80) + input
    elif isinstance(input,list):
        output = ''
        for item in input: output += rlp_encode(item)
        return encode_length(len(output), 0xc0) + output

def encode_length(L,offset):
    if L < 56:
         return chr(L + offset)
    elif L < 256**8:
         BL = to_binary(L)
         return chr(len(BL) + offset + 55) + BL
    else:
         raise Exception("input too long")

def to_binary(x):
    if x == 0:
        return ''
    else:
        return to_binary(int(x / 256)) + chr(x % 256)

解码

规则

RLP解码器的输入一律被视作二进制数组，输入的首字节称作 prefix，解码过程如下：

如果首字节在范围[0x00, 0x7f]，目标数据是字符串，即是该字节
如果首字节在范围[0x80, 0xb7]，目标数据是字符串，其长度为prefix - 0x80，紧跟在首字节之后
如果首字节在范围[0xb8, 0xbf]，目标数据是字符串，超过55字节，首字节之后即长度，长度值的宽度为prefix - 0xb7，根据该长度值往后读取目标数据
如果首字节在范围[0xc0, 0xf7]，目标数据是列表，总长度为prefix - 0xc0，首字节之后紧跟第一个列表长度值，其范围在 [0x80, 0xb7]，根据长度值往后读取一个目标数据，之后是第二个列表长度值，如此往下继续读取
如果首字节在范围[0xf8 0xff] ，目标数据是列表，超过55字节，首字节之后为列表总长度，长度值的宽度为prefix - 0xf8，总长度之后开始读取第一个子列表，按照以上规则进行解码。

代码

def rlp_decode(input):
    if len(input) == 0:
        return
    output = ''
    (offset, dataLen, type) = decode_length(input)
    if type is str:
        output = instantiate_str(substr(input, offset, dataLen))
    elif type is list:
        output = instantiate_list(substr(input, offset, dataLen))
    output + rlp_decode(substr(input, offset + dataLen))
    return output

def decode_length(input):
    length = len(input)
    if length == 0:
        raise Exception("input is null")
    prefix = ord(input[0])
    if prefix <= 0x7f:
        return (0, 1, str)
    elif prefix <= 0xb7 and length > prefix - 0x80:
        strLen = prefix - 0x80
        return (1, strLen, str)
    elif prefix <= 0xbf and length > prefix - 0xb7 and length > prefix - 0xb7 + to_integer(substr(input, 1, prefix - 0xb7)):
        lenOfStrLen = prefix - 0xb7
        strLen = to_integer(substr(input, 1, lenOfStrLen))
        return (1 + lenOfStrLen, strLen, str)
    elif prefix <= 0xf7 and length > prefix - 0xc0:
        listLen = prefix - 0xc0;
        return (1, listLen, list)
    elif prefix <= 0xff and length > prefix - 0xf7 and length > prefix - 0xf7 + to_integer(substr(input, 1, prefix - 0xf7)):
        lenOfListLen = prefix - 0xf7
        listLen = to_integer(substr(input, 1, lenOfListLen))
        return (1 + lenOfListLen, listLen, list)
    else:
        raise Exception("input don't conform RLP encoding form")

def to_integer(b):
    length = len(b)
    if length == 0:
        raise Exception("input is null")
    elif length == 1:
        return ord(b[0])
    else:
        return ord(substr(b, -1)) + to_integer(substr(b, 0, -1)) * 256

https://eth.wiki/fundamentals/rlp ↩︎