Today we are extremely excited to announce the v0.5 release of Pyston, our high performance Python JIT. We’ve been a bit quiet for the past few months, and that’s because we’ve been working on some behind-the-scenes technology that we are finally ready to unveil. It might be a bit less shiny than some other things we could have worked on, but this change makes Pyston much more ready to use.
Pyston is now using reference counting.
Reference counting (“refcounting”), is a form of automatic memory management. It’s usually viewed as slower and less sophisticated than using a tracing garbage collector (a “GC”), the predominant technique in modern languages. All past versions of Pyston contained tracing garbage collectors, and much of our work from 0.4 to 0.5 was tearing it out in favor of refcounting.
Why did we do this? In short, because CPython (the main Python implementation) uses refcounting. We used a GC initially to try to get more performance. But applying a tracing GC to a refcounting C API, such as the one that Python has, is risky and comes with many performance pitfalls. And most challengingly, Pyston wants to support the large amount of code that has been written that relies on the special properties that refcounting provides (predictable immediate destruction). We found that we had to go to greater and greater lengths to support these programs, and there were also cases where we wouldn’t be able to support the applications in their current form.
So we decided to bite the bullet and convert to refcounting, with the goal of getting better application compatibility.
How did we do?
We are very happy to announce: we can run NumPy, unmodified.
Specifically: on their latest release (v1.11), we run their entire test suite with one test failure, for which they’ve accepted our patch. For their latest trunk, we have three test failures. We do need to use a modified version of part of their build chain (Cython), and we are currently slower on the test suite than CPython.
Regardless, we are very happy with this result, especially because we will continue to improve both the compatibility and performance.
There are quite a few non-refcounting features that made it into this release as well:
- Signal handling
- Frame introspection of exited frames
- Generator cleanup
- Support for more C API functions, such as custom tracebacks
- and many more small fixes than we can list here
These are a large part of our progress on NumPy, and they also help us run other tricky libraries such as py.test, lxml, and cffi. We’ve also greatly reduced the number of modifications that we maintain to the Python standard libraries and C extensions. Overall, refcounting was a big investment, but it’s bought us compatibility wins that we would have had a very hard time getting otherwise.
Unfortunately, since performance wasn’t our goal for this release, we did slide backwards a bit. v0.5 is about 10% slower than v0.4 was, largely due to the change to refcounting. We are okay with the regression since we explicitly focused on compatibility for the last six months, and our refcounting implementation still has many available optimizations.
As a side note, the “conventional wisdom” is that refcounting should have been even slower compared to using a GC. We attribute this mainly to the compatibility restrictions that hampered our GC implementation.
There is a lot of low-hanging performance fruit available to us right now which we have been explicitly avoiding while we finished refcounting. Now would be a great time to consider contributing since we have more ideas than we can implement ourselves. This is especially true when it comes to NumPy performance.
Currently, we take about twice as long to run the NumPy test suite as CPython does. We don’t know how this will translate to performance on real NumPy programs, but we do know that much of the slowdown falls into two categories: the first is NumPy hits code paths that are otherwise-rare in Pyston and are currently unoptimized. The second is a bit more subtle: NumPy frequently calls from C code back into the Python runtime, which is expensive for us because it doesn’t benefit from our JIT (in addition to being previously-rare). We have techniques inside Pyston to handle these situations and invoke our JIT from C code, and we’d like to start exposing that so that NumPy and other libraries can use it.
We apologize — again — for the lengthy release cycle. We didn’t expect refcounting to take this long, and we even knew that it would take longer than we expected. We’re planning on doing another blog post to talk about what the difficulties were with it and go into more of the technical details of our refcounting system.
Moving forward, our plan for 0.6 is to focus on performance. We would love help from the community on identifying what is important to make performant. We could work on making the NumPy test suite fast, but it may not end up translating to real NumPy workloads.
We’re at the point that trying out Pyston should be easy; it won’t benefit all workloads, but it should be easy to drop it in and see if it does. To test it out, try
docker run -it pyston/pyston
or check out our readme for other options for obtaining Pyston. To try NumPy, use the “pyston/pyston-numpy” image instead.
We have quite a few optimization ideas lined up, and the pressure has been strong to delay the 0.5 release “just one more week” so that we have time to include some of them. Expect to see an 0.5.1 release that improves performance.
Refcounting brings Pyston one step closer to being a drop-in replacement for CPython. There is still much more work to do, but we feel like with refcounting we’ve reached a threshold where we’d like to start getting Pyston into peoples’ hands. It’s still very much beta software, so there are many rough edges and unoptimized casses. But we want your feedback on what’s working and what’s not.
Finally, we would like to thank all of our open source contributors who have contributed to this release, and especially Nexedi for their employment of Boxiang Sun, one of our core contributors who helped greatly with the NumPy support.
- Boxiang Sun
- Dong-hee Na
- Rudi Chen
- Long Ang
- Tony Narlock
- Felipe Volpone
- Daniel Milde
- Krish Monut
- Jacek Wielemborek