Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

实现一个编译器

$
0
0
最终效果

动态类型, 内置int,str,list,dict(from python), function

function执行后可以保留内部的变量.

不想自己实现这么多类型,用python来实现

整个语言基于Auto变量,所以两句话就可以跑起编译器

a = Auto(open("hello.pym").read(), name='__main__') a.call()

call会执行几个过程:

1) tokenize:词法分析

2) stepize:产生字节码

3) step:执行字节码

若输入代码

1 fab = function() { 2 value = [1,1]; 3 size = len(value); 4 function get(pos) { 5 pos -= 1; 6 if (pos < 0) { 7 return -1; 8 } else if (pos < size) { 9 return value[pos]; 10 } else { 11 while (pos >= size) { 12 value += [value[size-1]+value[size-2]]; 13 size += 1; 14 } 15 return value[pos]; 16 } 17 } 18 }; 19 fab(); 20 print fab.get(10); 21 print fab.get(20);

会产生输出

运行中产生的字节码:

[STEPIZE __main__] 0 PUSH_VAR (str)fab 1 PUSH_CONST (str)\n value = [1,1];\n size = len(value);\n function get(pos) {\n pos -= 1;\n if (pos < 0) {\n return -1;\n } else if (pos < size) {\n return value[pos];\n } else {\n while (pos >= size) {\n value += [value[size-1]+value[size-2]];\n size += 1;\n }\n return value[pos];\n }\n } 2 PUSH_CONST (list)[] 3 MAKE_FUNCTION (NoneType)None 4 ASSIGN (NoneType)None 5 PUSH_VAR (str)fab 6 CALL (int)0 7 POP_ALL (NoneType)None 8 PUSH_VAR (str)fab 9 PUSH_CONST (str)get 10 GET_METHOD (NoneType)None 11 PUSH_CONST (int)10 12 CALL (int)1 13 PRINT (NoneType)None 14 PUSH_VAR (str)fab 15 PUSH_CONST (str)get 16 GET_METHOD (NoneType)None 17 PUSH_CONST (int)20 18 CALL (int)1 19 PRINT (NoneType)None [STEPIZE fab] 0 PUSH_VAR (str)value 1 PUSH_CONST (int)1 2 PUSH_CONST (int)1 3 BUILD_LIST (int)2 4 ASSIGN (NoneType)None 5 PUSH_VAR (str)size 6 PUSH_VAR (str)len 7 PUSH_VAR (str)value 8 CALL (int)1 9 ASSIGN (NoneType)None 10 PUSH_VAR (str)get 11 PUSH_CONST (str)\n pos -= 1;\n if (pos < 0) {\n return -1;\n } else if (pos < size) {\n return value[pos];\n } else {\n while (pos >= size) {\n value += [value[size-1]+value[size-2]];\n size += 1;\n }\n return value[pos];\n } 12 PUSH_CONST (list)['pos'] 13 MAKE_FUNCTION (NoneType)None 14 ASSIGN (NoneType)None [STEPIZE get] 0 PUSH_VAR (str)pos 1 PUSH_VAR (str)pos 2 PUSH_CONST (int)1 3 SUB (NoneType)None 4 ASSIGN (NoneType)None 5 PUSH_VAR (str)pos 6 PUSH_CONST (int)0 7 LT (NoneType)None 8 JUMP_IF_FALSE (int)13 9 PUSH_CONST (int)1 10 NEG (NoneType)None 11 RETURN (NoneType)None 12 JUMP (int)52 13 PUSH_VAR (str)pos 14 PUSH_VAR (str)size 15 LT (NoneType)None 16 JUMP_IF_FALSE (int)22 17 PUSH_VAR (str)value 18 PUSH_VAR (str)pos 19 GET_ITEM (NoneType)None 20 RETURN (NoneType)None 21 JUMP (int)52 22 PUSH_VAR (str)pos 23 PUSH_VAR (str)size 24 GE (NoneType)None 25 JUMP_IF_FALSE (int)48 26 PUSH_VAR (str)value 27 PUSH_VAR (str)value 28 PUSH_VAR (str)value 29 PUSH_VAR (str)size 30 PUSH_CONST (int)1 31 SUB (NoneType)None 32 GET_ITEM (NoneType)None 33 PUSH_VAR (str)value 34 PUSH_VAR (str)size 35 PUSH_CONST (int)2 36 SUB (NoneType)None 37 GET_ITEM (NoneType)None 38 ADD (NoneType)None 39 BUILD_LIST (int)1 40 ADD (NoneType)None 41 ASSIGN (NoneType)None 42 PUSH_VAR (str)size 43 PUSH_VAR (str)size 44 PUSH_CONST (int)1 45 ADD (NoneType)None 46 ASSIGN (NoneType)None 47 JUMP (int)22 48 PUSH_VAR (str)value 49 PUSH_VAR (str)pos 50 GET_ITEM (NoneType)None 51 RETURN (NoneType)None 55 [STEPIZE get] 0 PUSH_VAR (str)pos 1 PUSH_VAR (str)pos 2 PUSH_CONST (int)1 3 SUB (NoneType)None 4 ASSIGN (NoneType)None 5 PUSH_VAR (str)pos 6 PUSH_CONST (int)0 7 LT (NoneType)None 8 JUMP_IF_FALSE (int)13 9 PUSH_CONST (int)1 10 NEG (NoneType)None 11 RETURN (NoneType)None 12 JUMP (int)52 13 PUSH_VAR (str)pos 14 PUSH_VAR (str)size 15 LT (NoneType)None 16 JUMP_IF_FALSE (int)22 17 PUSH_VAR (str)value 18 PUSH_VAR (str)pos 19 GET_ITEM (NoneType)None 20 RETURN (NoneType)None 21 JUMP (int)52 22 PUSH_VAR (str)pos 23 PUSH_VAR (str)size 24 GE (NoneType)None 25 JUMP_IF_FALSE (int)48 26 PUSH_VAR (str)value 27 PUSH_VAR (str)value 28 PUSH_VAR (str)value 29 PUSH_VAR (str)size 30 PUSH_CONST (int)1 31 SUB (NoneType)None 32 GET_ITEM (NoneType)None 33 PUSH_VAR (str)value 34 PUSH_VAR (str)size 35 PUSH_CONST (int)2 36 SUB (NoneType)None 37 GET_ITEM (NoneType)None 38 ADD (NoneType)None 39 BUILD_LIST (int)1 40 ADD (NoneType)None 41 ASSIGN (NoneType)None 42 PUSH_VAR (str)size 43 PUSH_VAR (str)size 44 PUSH_CONST (int)1 45 ADD (NoneType)None 46 ASSIGN (NoneType)None 47 JUMP (int)22 48 PUSH_VAR (str)value 49 PUSH_VAR (str)pos 50 GET_ITEM (NoneType)None 51 RETURN (NoneType)None 6765 Auto的实现

每个Auto都有一个value,用来存储python内置类型

每个Auto都有一个namespace,用来存储内部名称空间(名称+值)

每个Auto都有一个belongto,用来表示属于那个Auto的名称空间

若Auto被当做函数(call):

先看是不是buildin

若不是, 将value中的str作为函数体,argnames为参数名, 执行 stepize等等

若是, 执行buildin

可以任意使用/修改,上层或全局变量

其中产生的变量会加入namespace

固定返回一个值

class Auto(object): def __init__(self,value=None,belongto=None,argnames=None,name=None,buildin=None): self.value = value self.belongto = belongto self.argnames = argnames or [] self.buildin = buildin if name: self.namespace = {'self':self, '__name__':Auto(name)} else: self.namespace = {'self':self} def __str__(self): s = str(self.value).replace("\n",r"\n") if output_short and len(s)>15: return s[:10]+'...' return s def __repr__(self): s = str(self.value).replace("\n",r"\n") if output_short and len(s)>15: return "Auto("+s[:10]+'...'+")" return "Auto("+s+")" def call(self, args=None): if self.buildin != None: return self.buildin(args) self.stack = [] if not isinstance(self.value, str): raise Exception("uncallable") if args: for x,y in zip(self.argnames,args): y.belongto = self y.namespace['__name__'] = Auto(x) self.namespace[x] = y funcname = self.namespace.get('__name__') if not funcname: funcname = '' else: funcname = str(funcname.value) if show_tokens: print "\n[TOKENIZE "+funcname+"]" tokens = tokenize(self.value) if show_steps: print "\n[STEPIZE "+funcname+"]" self.steps = stepize(tokens) if show_steps: for i,x in enumerate(self.steps): print " {0:3}".format(i),x # run steps if show_var: print "\n[CALL "+funcname+"]" self.l = len(self.steps) self.i = 0 while self.i < self.l: self.step_once() if show_var: print "[END "+funcname+"]\n" if self.stack: return self.stack[0] else: return Auto(None) def step_once(self): t = self.steps[self.i] if show_var: print self.i,":",t self.i += 1 if t.type == "PUSH_VAR": a = self.namespace.get(t.value) b = self.belongto while a == None and b != None: a = b.namespace.get(t.value, None) b = b.belongto if a == None: a = Auto(None) a.namespace['__name__'] = Auto(t.value) a.belongto = self self.stack.append(a) elif t.type == "ASSIGN": a = self.stack.pop() b = self.stack.pop() name = b.namespace['__name__'] if b.belongto != None: a.namespace['__name__'] = name a.belongto = b.belongto b.belongto.namespace[name.value] = a else: a.namespace['__name__'] = name a.belongto = self self.namespace[name.value] = a elif t.type == "PUSH_CONST": self.stack.append(Auto(t.value)) elif t.type == "POP_ALL": self.stack = [] elif t.type == "GE": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value >= a.value)) elif t.type == "GT": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value > a.value)) elif t.type == "LE": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value <= a.value)) elif t.type == "LT": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value < a.value)) elif t.type == "EQ": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value == a.value)) elif t.type == "NE": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value != a.value)) elif t.type == "ADD": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value + a.value)) elif t.type == "SUB": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value - a.value)) elif t.type == "MUL": b = self.stack.pop() a = self.stack.pop() self.stack.append(Auto(b.value * a.value)) elif t.type == "DIV": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value / a.value)) elif t.type == "AND": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value and a.value)) elif t.type == "OR": a = self.stack.pop() b = self.stack.pop() self.stack.append(Auto(b.value or a.value)) elif t.type == "NOT": a = self.stack.pop() self.stack.append(Auto(not a.value)) elif t.type == "NEG": a = self.stack.pop() if isinstance(a.value, str): self.stack.append(Auto(a.value[::-1])) else: self.stack.append(Auto(-a.value)) elif t.type == "JUMP_IF_FALSE": a = self.stack.pop() if not a.value: self.i = int(t.value) elif t.type == "JUMP": self.i = int(t.value) elif t.type == "PRINT": for x in self.stack: print x, print self.stack = [] elif t.type == "GET_METHOD": a = self.stack.pop() b = self.stack.pop() c = b.namespace.get(a.value,Auto(None)) c.belongto = b self.stack.append(c) elif t.type == "CALL": args = self.stack[-t.value:] for x in range(t.value): self.stack.pop() a = self.stack.pop() self.stack.append(a.call(args)) elif t.type == "RETURN": a = self.stack.pop() self.stack = [a] self.i = self.l elif t.type == "MAKE_FUNCTION": a = self.stack.pop() b = self.stack.pop() if isinstance(b.value, str) and isinstance(a.value, list): self.stack.append(Auto(b.value,argnames=a.value)) else: self.stack.append(Auto(None)) elif t.type == 'BUILD_LIST': l = self.stack[-t.value:] for x in range(t.value): self.stack.pop() self.stack.append(Auto(l)) elif t.type == 'BUILD_MAP': m = {} for x in range(t.value): v = self.stack.pop() i = self.stack.pop() m[i.value] = v self.stack.append(Auto(m)) elif t.type == 'GET_ITEM': a = self.stack.pop() b = self.stack.pop() if isinstance(a.value, int) and isinstance(b.value, list): if a.value < len(b.value): c = b.value[a.value] else: c = Auto(None) elif isinstance(a.value, int) and isinstance(b.value, str): if a.value < len(b.value): c = Auto(b.value[a.value]) else: c = Auto(None) elif isinstance(a.value, str) and isinstance(b.value, dict): c = b.value.get(a.value,Auto(None)) else: raise Exception("error in getitem") c.belongto = b self.stack.append(c) else: raise Exception('canot step '+t.type) if show_var: print " "*40,self.stack print " "*40,self.namespace def func_register(self,name,func): self.namespace[name] = Auto("<buildin-function "+name+'>', buildin=func, name=name) TOKENIZE

仍是暴力实现,不过有一个小问题,解析整数时不能解析-号,否则就会出错.

例如: x-1; 若 int是 -?\d+ 就不对,

所以int 应该是 \d+, 负号单独解析

STEPIZE

由于是手写的,所以就简单暴力点

<stmt> => 'print' ( <expr5> (',' <expr5>)* )? ';'

| if语句 | while语句| function语句 | name语句

| 'continue' ';'| 'break' ';' | 'return' <expr5>? ';'| <expr5> ';'

name语句 => 'name' ';' | 'name' 'assign' <

Viewing all articles
Browse latest Browse all 9596

Trending Articles