前言 阅读 Pytorch 源码时涉及到 Python 代码编译执行相关的内容,为了便于理解,简单学习了 Inside The Python Virtual Machine 的部分内容,本文内容主要来自于此,相关细节请参考原文。
简单总结 Python 程序编译过程的步骤
将源代码转化为 AST(抽象语法树, abstract syntax tree) 生成符号表(symbol table)。 从 AST 生成 code object。 源代码转化 AST 每当从命令行执行 Python 模块时,都会将模块文件的内容分解为一个个合法的 Python tokens 或者发现语法错误时进行报错。
# ==== test.py ==== a = 1 b = 1 c = a + b print(c) # ==== test.py ==== from tokenize import tokenize f = open("./test.py", 'rb') for t in tokenize(f.readline): print(t) 打印结果如下:
TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=1 (NAME), string='a', start=(1, 0), end=(1, 1), line='a = 1\r\n')
TokenInfo(type=54 (OP), string='=', start=(1, 2), end=(1, 3), line='a = 1\r\n')
TokenInfo(type=2 (NUMBER), string='1', start=(1, 4), end=(1, 5), line='a = 1\r\n')
TokenInfo(type=4 (NEWLINE), string='\r\n', start=(1, 5), end=(1, 7), line='a = 1\r\n')
TokenInfo(type=1 (NAME), string='b', start=(2, 0), end=(2, 1), line='b = 1\r\n')
TokenInfo(type=54 (OP), string='=', start=(2, 2), end=(2, 3), line='b = 1\r\n')
TokenInfo(type=2 (NUMBER), string='1', start=(2, 4), end=(2, 5), line='b = 1\r\n')
TokenInfo(type=4 (NEWLINE), string='\r\n', start=(2, 5), end=(2, 7), line='b = 1\r\n')
TokenInfo(type=1 (NAME), string='c', start=(3, 0), end=(3, 1), line='c = a + b\r\n')
TokenInfo(type=54 (OP), string='=', start=(3, 2), end=(3, 3), line='c = a + b\r\n')
TokenInfo(type=1 (NAME), string='a', start=(3, 4), end=(3, 5), line='c = a + b\r\n')
TokenInfo(type=54 (OP), string='+', start=(3, 6), end=(3, 7), line='c = a + b\r\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 8), end=(3, 9), line='c = a + b\r\n')
TokenInfo(type=4 (NEWLINE), string='\r\n', start=(3, 9), end=(3, 11), line='c = a + b\r\n')
TokenInfo(type=1 (NAME), string='print', start=(4, 0), end=(4, 5), line='print(c)')
TokenInfo(type=54 (OP), string='(', start=(4, 5), end=(4, 6), line='print(c)')
TokenInfo(type=1 (NAME), string='c', start=(4, 6), end=(4, 7), line='print(c)')
TokenInfo(type=54 (OP), string=')', start=(4, 7), end=(4, 8), line='print(c)')
TokenInfo(type=4 (NEWLINE), string='', start=(4, 8), end=(4, 9), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='') CPython 会对 tokenize 结果生成一个 parser tree,然后将其转换成 AST...