Testing your stuff is very important. That’s why we do performance testing at work, and usually everything turns out to be fine. But one day an unpleasant news have come: server under load have used all available RAM and went to swap. That is not good. Not at all.
That huge spike is how swapping server looks like
Day one. Profiling.
Profiler is a great tool, but it did not give us answer why the memory usage grows like crazy. PHP interpreter was using ~18M or memory on each request in question (we are using Symfony 2 with some bundles, that’s why). I was looking at numbers and scratching my head. The day have ended there.
Days two and three. More profiling.
I knew that some extensions could leak. Like cURL for example. Here is a bug report #65458. It’s for 5.5.2, but there are report for 5.4 somewhere, and that’s exactly what we are using. So i had put down simple test – command line script – trying to catch cURL with a hand in cookie jar. Well, you guessed it, right? xhprof reported same memory consumption, no matter how long the script was working. Back to square one.
Day four. Time for low level sorcery.
So PHP works in interesting and safe way: at the end of each request/script run it will drop all memory that have been allocated by your code and call it a day. Or at least that’s what i read several times in manuals. Knowing how things are working (or not working) it was time to get to real tools. Digging through the internet i have found mentions of Valgrind, a powerful set of tools for monitoring applications. Just what i need!
Apparently Valgrind runs any command you want to check for memory leaks and tracks how much memory allocated/freed. This has some (significant) overhead, but it totally worth it.
Now, there is this page in PHP manual: https://bugs.php.net/bugs-getting-valgrind-log.php. According to it, you cannot just use Valgrind with PHP because Zend uses it’s own memory management, but that feature can be disables (oh, blessing) during runtime. Let’s do it!
$ export ZEND_DONT_UNLOAD_MODULES=1; export USE_ZEND_ALLOC=0; valgrind --tool=memcheck --num-callers=30 --track-origins=yes --leak-check=full --log-file=php.log php ~/1.php
Valgrind will generate huge log file, filled with 1,2, 100 bytes lost messages, we have to look for those, marked ‘definitely lost’. And that’s what I have found:
==24504== 16 bytes in 1 blocks are definitely lost in loss record 24 of 383
==24504== at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==24504== by 0x126A6708: recode_malloc (in /usr/lib64/librecode.so.0.0.0)
==24504== by 0x126A42E5: declare_implied_surface (in /usr/lib64/librecode.so.0.0.0)
==24504== by 0x126A55FA: recode_new_outer (in /usr/lib64/librecode.so.0.0.0)
==24504== by 0x123FD11A: ??? (in /opt/rh/php54/root/usr/lib64/php/modules/recode.so)
==24504== by 0x5E48E2: zend_startup_module_ex (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x5ED9B4: zend_hash_apply (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x5E8039: zend_startup_modules (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x584949: php_module_startup (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x689DBC: ??? (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x68B728: ??? (in /opt/rh/php54/root/usr/bin/php)
==24504== by 0x3828A1ED1C: (below main) (in /lib64/libc-2.12.so)
And the summary wasn’t pleasant at all:
==24504== LEAK SUMMARY:
==24504== definitely lost: 10,794 bytes in 1,033 blocks
==24504== indirectly lost: 720 bytes in 7 blocks
==24504== possibly lost: 0 bytes in 0 blocks
==24504== still reachable: 240,825 bytes in 3,254 blocks
==24504== suppressed: 0 bytes in 0 blocks
==24504== Reachable blocks (those to which a pointer was found) are not shown.
What is ‘recode’ extension? Never heard of it. Let’s disable it and see what happens!
$ export ZEND_DONT_UNLOAD_MODULES=1; export USE_ZEND_ALLOC=0; valgrind --tool=memcheck --num-callers=30 --track-origins=yes --leak-check=full --log-file=php-no-recode.log php ~/1.php
==32272== LEAK SUMMARY:
==32272== definitely lost: 0 bytes in 0 blocks
==32272== indirectly lost: 0 bytes in 0 blocks
==32272== possibly lost: 0 bytes in 0 blocks
==32272== still reachable: 238,194 bytes in 3,251 blocks
==32272== suppressed: 0 bytes in 0 blocks
==32272== Reachable blocks (those to which a pointer was found) are not shown.
So it turns out not PHP code or Apache was leaking, but librecode itself. Fun stuff.
Yeah, those were four intense days. But some lessons learned:
- use xhprof for a long running php scripts, daemons etc. ;
- valgrind is your friend;
- disable unused PHP extensions;
- test. Test. Test.