Pyston 0.3: Self-hosting Sufficiency

We’ve been working hard over the past five months and are very happy to release Pyston 0.3, the newest version of our high-performance Python implementation. The biggest features of this release are that we can now run all of our internal scripts on Pyston, as well as improved performance. We also have some exciting news to share about our project status and plans.

Language compatibility

Self-hosting, or running a compiler through itself, is one of the best ways to demonstrate language compatibility. Pyston isn’t a static compiler or written in Python, so “self-hosting” is a bit of a misnomer / attention grabber, but we still have a number of internal Python scripts of various complexity, and with this release we can now run them all on Pyston. The most complex of our scripts is our test runner, which spawns multiple threads, spawns subprocess to run the tests, calls pickle to load the expected results, and reports back to the user. In the process it executes a few thousand lines of code across a few dozen standard libraries and extension modules.

Unfortunately, we make fairly little use of our self-host ability at the moment. We only have a single Python script that’s actually involved in the building of Pyston and even then only tangentially. And we can’t default to running our tester in self-host mode, since what if we have a bug that breaks the test runner and makes all the tests pass? But at least we have the ability.

For some quantitative stats of debatable value, we can look at how many of the Python standard libraries and extension modules we can import. (Note: this is just importing the library correctly, not testing any of its functionality beyond that. Hopefully in the 0.4 release we can say how many of the CPython test cases we can pass.) At the time of our 0.2 release, we were able to import 56 top-level standard libraries, and 12 standard extension modules. Now, with the 0.3 release, we are able to import 117 libraries and 27 extension modules, which is more than twice as many.

We still have a long way to go, though, since this is only about half of the libraries and extension modules in CPython (though we don’t have to support all of them immediately). Thankfully, our C API support is becoming fairly developed, and while it was originally intended for supporting C extension modules, it works just as well to support CPython’s internal code. We’ve gotten to the point that we can often copy large swaths of code from CPython into Pyston without modification, and while it’s hard to measure, I think we currently compile about as much CPython code into Pyston as code that we wrote ourselves. So without really intending it, we’ve been adopting a “CPython with a replaced core” architecture and been moving away from the “completely from scratch” model we started with. Regardless of whether we fully adopt that strategy or not, we’re currently able to use large amounts of implementation from CPython and move much faster.

Performance

We were hesitant to announce performance numbers in the 0.1 and 0.2 releases, since both of those releases focused on longer term investments (getting the core infrastructure in place, and language features, respectively) from which we didn’t want to get distracted. In the past month or so, though, we’ve finally taken the time to go back and expand our benchmark suite and fix some of the low hanging fruit that we skipped during initial implementation, and are happy to talk about how we’re doing. The result is that we are now (on our small benchmark suite) faster than CPython! We are currently 1% faster than CPython using a geometric mean, with individual benchmarks varying between 2x faster and 2x slower. You can see more details and up-to-date benchmark results at speed.pyston.org. (A hearty thanks to the PyPy team for the performance tracking software.)

“1% faster than CPython” is clearly not our overall performance target, but we are happy with the speed at which we got here, and the amount of optimization headroom we still have. Moving forward, we could continue working on optimizations and have more impressive benchmark results, but we’re taking this milestone as a signal that we should shift focus back to feature work again.

If we were to break down our performance versus CPython, we (unsurprisingly) have better steady-state performance but worse startup time. As a quick measure of how our benchmark suite balances the two, the benchmark geomean has a value of 6.0 seconds; it’s hard to tell if this is the same balance as for our target server workloads.

Most of our startup time comes from LLVM jitting our code. This doesn’t mean that LLVM is to blame: our AST interpreter is fairly slow, requiring us to often tier out of it to our LLVM JIT. We also generate some very large LLVM IR in order to support our frame introspection, which slows down compilation times. We have a number of ideas on how to improve startup time on both these fronts (make LLVM jit quicker, and go to it less).
For steady-state performance, we tend to do well at executing our JIT’ed code, but our memory system — though much better than it was in 0.2 — is still not as good as CPython’s or other implementations’. Most of our speedup comes from our inline caching mechanisms, and we still have a lot of open headroom for more type speculations and LLVM optimizations, since we do almost none of either.

Project plans

On the project management side, we now have multiple people working full time on the project, in addition to the part-time help we’ve been getting! With the additional resources we’ve been able to move more quickly (you can see an uptick in GitHub commits), and we’ve set some aggressive goals for running Dropbox on Pyston. We’re very excited about how much we’re going to be able to get done.

Our goal moving forward is to continue expanding the fraction of the language+runtime that we support, and maintain certain performance targets as we go. Our current performance target is 1x CPython, but we may loosen it in order to prioritize feature work, since that tends to be more time-sensitive (blocks more things) than performance work. We’ll be targeting larger and larger applications to run under Pyston, with the ultimate target being the Dropbox server codebase.

Conclusion

As always, you can find our code on GitHub. We’ve released a binary that may or not run on your system, but is available for you to play with if you’re interested — but remember that this is still an alpha and not ready for real use. If you run into issues or would like to contribute, please let us know!

5 thoughts on “Pyston 0.3: Self-hosting Sufficiency”

Shola Smith says:

February 25, 2015 at 10:48 am

How would this be different from pypy itself? And do you intend for this to replace the normal python sometimes later?

LikeLike

John Hall says:

February 25, 2015 at 5:59 pm

I’m glad the pyston team is making progress on this.

I am particularly interested given the recent development of mypy and Guido’s desire to incorporate it into the python specification. I would think that providing type annotations would lead to greater speed-ups for this project than for CPython.

LikeLike

dorothy says:

March 11, 2015 at 1:48 pm

“Our current performance target is 1x CPython, but we may loosen it[…]”

I am now confused. What is the current goal of pyston? I thought it was to be faster than cpython or does it now have other goals?

LikeLike

- Kevin Modzelewski says:
  
  March 11, 2015 at 8:44 pm
  
  Sorry, I should have been more clear there — that’s our target during our ramp-up phase, while the focus is on language + extension module support. The overall project goal is to be faster than CPython, but on a workload we don’t yet support, so we don’t want to get distracted with over-optimizing microbenchmarks since there’s no guarantee that those optimizations would ultimately be positive (or even non-negative).
  
  LikeLike
  
Bachsau says:

June 19, 2016 at 3:59 am

I don’t find it important to have fast startup times. When writing desktop applications or daemons in python the process will be running for long times and start only once, so this will be of no consequence.

LikeLike