Tracing Memory Leaks In Python

Recently, I ran into a hard-to-put-your-finger-on issue: a memory leak. The memory usage would increase gradually until the load on the server becomes so high that it becomes unresponsive. While restarting the server would get us out of the pinch, it was not really a solution as that is just kicking the can down the road. Until this is issue is found and fixed, it’ll just keep coming back again and again.

An added challenge for me is to do this for a process that is already running.

Luckily for me, I found a post on Stack Overflow with an answer that looked exactly like what I was looking for so I gave it a shot. It worked out very well. Now I’m armed with a tool ready for the next memory leak that comes my way.

There are 2 modules that we will utilize for this analysis:

  • tracemalloc
    • Available as a built-in module starting in Python 3.4
    • A debug tool to trace memory blocks allocated by Python
  • pyrasite

I do the following on an Ubuntu 18.04 LTS server running python 3.6.

pyrasite

Installing and getting pyrasite to run turned out to be a bit tricky so I’m documenting those steps here.

Install pyrasite by running the following:

pip install pyrasite

On Ubuntu, since 10.10, the scope of ptrace is restricted for security reasons. So if you want to trace the memory allocation for any python process running in your system, you will need to perform the following additional steps:

echo 0 > /proc/sys/kernel/yama/ptrace_scope

To make this change permanent, set ptrace_scope to 0 in /etc/sysctl.d/10-ptrace.conf. What 0 and 1 do is really this:

  • 0 - Standard scope ** PTRACE can be used by any process on any other process that shares the same uid and is dumpable (i.e. hasn’t been setuid, or has prctl(PR_SET_DUMPABLE) called on it). CAP_SYS_PTRACE overrides any limitations.
  • 1 - Child-only ** PTRACE can be used by any process on any child process that shares the same uid and is dumpable. CAP_SYS_PTRACE overrides any limitations.

In my case, I was only looking to troubleshoot child processes, so I was OK with leaving this setting unchanged.

Now that pyrasite is installed, I attempt to run pyrasite-shell. You need to provide the pid of the process you want to connect to. For the sake of this post, let’s assume that pid to be 476. I issue the following command:

pyrasite-shell 476

I wait for good while and nothing happens. It just hangs. So I issue a Ctrl + c and I get this stacktrace:

  File "/usr/local/bin/pyrasite-shell", line 11, in <module>
    sys.exit(shell())
  File "/usr/local/lib/python2.7/dist-packages/pyrasite/tools/shell.py", line 30, in shell
    ipc.connect()
  File "/usr/local/lib/python2.7/dist-packages/pyrasite/ipc.py", line 95, in connect
    self.wait()
  File "/usr/local/lib/python2.7/dist-packages/pyrasite/ipc.py", line 151, in wait
    (clientsocket, address) = self.server_sock.accept()
  File "/usr/lib/python2.7/socket.py", line 202, in accept
    sock, addr = self._sock.accept()
KeyboardInterrupt

A quick search brings me to this GitHub issue where one answer says to specify verbose=True as the third parameter in this method call.

Running pyrasite-shell this time again showed me what the problem was:

pyrasite-shell 476

/bin/sh: 1: gdb: not found

I install gdb in Ubuntu by running the following command:

apt update && apt install gdb

I ran pyrasite-shell for the 3rd time and the program hung again. I force quit and I see the following error (I still had verbose=True from the earlier change):

'PyGILState_Ensure' has unknown return type; cast the call to its declared return type
'PyRun_SimpleString' has unknown return type; cast the call to its declared return type
History has not yet reached $1.

^CTraceback (most recent call last):
  File "/usr/bin/pyrasite-shell", line 11, in <module>
    load_entry_point('pyrasite==2.0', 'console_scripts', 'pyrasite-shell')()
  File "/usr/lib/python2.7/site-packages/pyrasite/tools/shell.py", line 30, in shell
    ipc.connect()
  File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 95, in connect
    self.wait()
  File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 151, in wait
    (clientsocket, address) = self.server_sock.accept()
  File "/usr/lib/python2.7/socket.py", line 206, in accept
    sock, addr = self._sock.accept()
KeyboardInterrupt

Yet another GitHub issue describes this exact problem. Luckily another user has solved this problem. Unfortunately the change has not yet been merged into pyrasite. So I install the patched version of pyrasite provided by the user who fixed the issue by running the following command:

pip install git+https://github.com/diamond-lizard/pyrasite.git

Now when I run pyrasite-shell again, I get the shell.

pyrasite-shell 476

Pyrasite Shell 2.0
Connected to '/your/python/process'
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(DistantInteractiveConsole)

>>>

Huzzah!

tracemalloc Top 10

Now run the following snippet to trace the memmory allocation.

import tracemalloc

tracemalloc.start()

# ... run your application ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

That’s it.

Happy tracing!