为了能够修改 code object,我定义了一个很小的类用来复制 code object,同时能够按我们的需求修改相应的值,然后重新生成一个新的 code object。
class MutableCodeObject(object): args_name = ("co_argcount", "co_kwonlyargcount", "co_nlocals", "co_stacksize", "co_flags", "co_code", "co_consts", "co_names", "co_varnames", "co_filename", "co_name", "co_firstlineno", "co_lnotab", "co_freevars", "co_cellvars") def __init__(self, initial_code): self.initial_code = initial_code for attr_name in self.args_name: attr = getattr(self.initial_code, attr_name) if isinstance(attr, tuple): attr = list(attr) setattr(self, attr_name, attr) def get_code(self): args = [] for attr_name in self.args_name: attr = getattr(self, attr_name) if isinstance(attr, list): attr = tuple(attr) args.append(attr) return self.initial_code.__class__(*args)这个类用起来很方便,解决了上面提到的 code object 不可变的问题。
>>> x = lambda y : 2 >>> m = MutableCodeObject(x.__code__) >>> m <new_code.MutableCodeObject object at 0x7f3f0ea546a0> >>> m.co_consts [None, 2] >>> m.co_consts[1] = '3' >>> m.co_name = 'truc' >>> m.get_code() <code object truc at 0x7f3f0ea2bc90, file "<stdin>", line 1>测试我们的新操作码
我们现在拥有了注入DEBUG_OP的所有工具,让我们来验证下我们的实现是否可用。我们将我们的操作码注入到一个最简单的函数中:
from new_code import MutableCodeObject def op_target(*args): print("WOOT") print("op_target called with args <{0}>".format(args)) def nop(): pass new_nop_code = MutableCodeObject(nop.__code__) new_nop_code.co_code = b"\x00" + new_nop_code.co_code[0:3] + b"\x00" + new_nop_code.co_code[-1:] new_nop_code.co_stacksize += 3 nop.__code__ = new_nop_code.get_code() import dis dis.dis(nop) nop() # Don't forget that ./python is our custom Python implementing DEBUG_OP hakril@computer ~/python/CPython3.5 % ./python proof.py 8 0 <0> 1 LOAD_CONST 0 (None) 4 <0> 5 RETURN_VALUE WOOT op_target called with args <([], <frame object at 0x7fde9eaebdb0>)> WOOT op_target called with args <([None], <frame object at 0x7fde9eaebdb0>)>看起来它成功了!有一行代码需要说明一下new_nop_code.co_stacksize += 3
co_stacksize 表示 code object 所需要的堆栈的大小
操作码DEBUG_OP往堆栈中增加了三项,所以我们需要为这些增加的项预留些空间
现在我们可以将我们的操作码注入到每一个 Python 函数中了!
重写字节码
正如我们在上面的例子中所看到的那样,重写 Pyhton 的字节码似乎 so easy。为了在每一个操作码之间注入我们的操作码,我们需要获取每一个操作码的偏移量,然后将我们的操作码注入到这些位置上(把我们操作码注入到参数上是有坏处大大滴)。这些偏移量也很容易获取,使用dis.Bytecode ,就像这样 。
def add_debug_op_everywhere(code_obj): # We get every instruction offset in the code object offsets = [instr.offset for instr in dis.Bytecode(code_obj)] # And insert a DEBUG_OP at every offset return insert_op_debug_list(code_obj, offsets) def insert_op_debug_list(code, offsets): # We insert the DEBUG_OP one by one for nb, off in enumerate(sorted(offsets)): # Need to ajust the offsets by the number of opcodes already inserted before # That's why we sort our offsets! code = insert_op_debug(code, off + nb) return code # Last problem: what does insert_op_debug looks like?