You don’t realize the value of a debugger until you’ve worked on a complicated or hard to visualize project using a development environment that provides decent debug capabilities: Want to know where you’re at in code execution? What’s taking so long? Just pause it and check. Wonder what valueisassignedtothatvariable? Mouse over it. Want to skip a bunch of code and continue running from a different section? Go for it.
When print(variable_name) is just not enough to give you an idea of what’s going on with your project ― or when adding such statements is just annoying ― debuggers are a great help in figuring things out.
python already gives us a built-in debugger in the form of pdb (a command line tool), but having a community as awesome as this language does, there are a lot of other options with graphical interfaces provided by the many development environments that already exist. The most common ones are JetBrain’s PyCharm , Wingare’s WingIDE , and even Microsoft’s Visual Studio Community .
However, we are not here to discuss how your debugger is better thanhis, or which one is prettier or more elegant. We’re here to learn a little bit more on how simple it is to write a python debugger that steps through your code, in the hopes of gaining a little bit more understanding of a few language internals, scratch an itch I’ve had for a long time, and get you more excited about this neat language and its ecosystem.
Now let’s get to it.
A quick primer on how python code is organized and processedContrary to what a decent number of people believe, python is actually a compiled language. Whenever you executecode, your module is run through a compiler that spits out bytecode which is cached as .pyc or __pycache__ files. The bytecode itself is what later is executed line by line.
In fact, the actual CPython code that runs a program is nothing more than a gigantic switch case statementrunninginaloop. An if-else that looks at an instruction’s bytecode and then dispositions it based on what that operation is intended to do.
The executable bytecode instructions are internally referenced as code objects , and the dis and inspect modules are used to produce or interpret them. These are immutable structures, that although referenced by other objects ― like functions ― do not contain any references themselves.
You can easily look at the bytecode that represents any given source through dis.dis() . Just give it a try with a random function or class. It’s a neat little exercise that’ll help you visualize what’s going on. The output will look something like this:
>>> def sample(a, b):... x = a + b
... y = x * 2
... print('Sample: ' + str(y))
...
>>> import dis
>>> dis.dis(sample)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_ADD
7 STORE_FAST 2 (x) 3 10 LOAD_FAST 2 (x)
13 LOAD_CONST 1 (2)
16 BINARY_MULTIPLY
17 STORE_FAST 3 (y) 4 20 LOAD_GLOBAL 0 (print)
23 LOAD_CONST 2 ('Sample: ')
26 LOAD_GLOBAL 1 (str)
29 LOAD_FAST 3 (y)
32 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
35 BINARY_ADD
36 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
39 POP_TOP
40 LOAD_CONST 0 (None)
43 RETURN_VALUE
Notice that each line in bytecode references its respective position in source code on the left column, and that it’s not a one-to-one relationship. There could be multiple smaller ― one could even say atomic ― operations that makeup a higher level instruction.
A frame object in python is what represents an execution frame. It contains a reference to the code object that’s currently executing, the local variables that it’s running with, the global names (variables) that are availableandreferencestoanyrelatedframes(liketheparentthatspawnedit).
There are lot more details about these objects to discuss here, but hopefully this is enough to wet your appetite. We won’t need much more for the purposes of our debugger, though you should check out the Diving Deeper section for links on where to looknext.
The sysmodulePython provides a number of utilities in its standard library through the sys module. Not only are there things like sys.path to get the python path or sys.platform to help find details about the OS in which you are running, but there’s also sys.settrace() and sys.setprofile() to help write language tools.
Yes, you read that right, python already has built-in hooks to help analyze code and interact with program execution. The sys.settrace() function will allow us to run a callback whenever execution advances to a new frame object and gives us a reference to it, which in turn provides the code object we’re working with.
For a quick example of how thislooks, let’s reuse the function from earlier:
def sample(a, b):x = a + b
y = x * 2
print('Sample: ' + str(y))
Assuming that every time a new frame is executed, we want a callback that prints the code object and line number its executing, we can define it as:
def trace_calls(frame, event, arg):if frame.f_code.co_name == "sample":
print(frame.f_code)
Now it’s simply a matter of setting it as our trace callback:
sys.settrace(trace_calls)And executing sample(3,2) should produce
$ python debugger.py<code object sample at 0x0000000000B46C90, file “.\test.py”, line 123>
Sample: 10
We need the if-statement to filter out function calls, otherwise you’ll see a whole bunch of things that you don’t care about,especiallywhenprintingtothescreen. Try it!
The code and frame objects have quite a few fields to describe what they represent. These include things like the file being executed, the function, variable names, arguments, line numbers, and the list goes on. They are fundamental to the execution of any python code and you can go through the language documentation for more details.
What if we want to debug everyline?The trace mechanism will set subsequent callbacks depending on the return value of the first callback. Returning None means that you’refinished, while returning another function effectively sets it as the trace function inside that frame.
Let’s see how this looks
5 def sample(a, b):6 x = a + b
7 y = x * 2
8 print('Sample: ' + str(y))
9
10 def trace_calls(frame, event, arg):
11 if frame.f_code.co_name == "sample":
12 print(frame.f_code)
13 return trace_lines
14 return
15
16 def trace_lines(frame, event, arg):
17 print(frame.f_lineno)
Now, if we execute the same code as before, we can see it print the line numbers as we progress through it:
$ python .\test.py<code object sample at 0x0000000000