Announcing 3.7-3.10 support and a new direction

September 29, 2022 Kevin Modzelewski

We’re very excited to announce that today we are releasing a new version of Pyston-lite, our Python JIT-as-an-extension-module, with the headline feature of supporting Python versions 3.7 through 3.10 (Mac and Linux). Previously we only supported Python 3.8, and support for other versions was one of our most-requested features.

To install Pyston-lite, simply do pip install pyston_lite_autoload and your Python environment will be sped up by about 10-25%.

We didn’t think that we’d be able to support multiple versions of Python at once, but this was enabled by some strategic changes that the rest of this post is about.

Background on Pyston-lite

Our original product, which we’re retroactively calling Pyston-full, is a fork of the entire CPython codebase. Having users install a fully-custom version of Python lets us make changes across the Python implementation, leading to the most optimizations and largest speedups.

The flip side is that it is fairly intensive to set up. While we believe Pyston-full is one of the most highly-compatible alternative Python implementations available, it can be difficult to switch Python implementations regardless of the ease of use of either implementation. Compounded on this, we decided to break the ABI which requires users to recompile extension modules. In theory this is not a big deal, but in practice the lack of available binary packages is a significant disincentive to use an alternative implementation.

The sum of all of this was that while we were very happy to achieve a 30% speedup with Pyston-full, it was very difficult to get people to start using it. We decided to try a different form factor: a pip-installable extension module called Pyston-lite.

Pyston-lite reception

It’s only been a few months since we released Pyston-lite, but the numbers since then have been striking: we are getting 100x more downloads per day for Pyston-lite than Pyston-full. Download counts can be misleading, and in our case it seems like most of the downloads are driven by a single project’s CI system, but we think the difference is still quite meaningful.

Because of this we’ve decided to make the strategic change to focus on Pyston-lite as our core product. We’ve heard many people ask for better Python performance, but our experience seems to say that a full alternative implementation is not a particularly appealing solution to this ask. So while it’s a bit difficult to accept that we are now providing a 10% speedup instead of 30%, we’ve decided that it’s much more important to provide something that people are willing to use.

What this means

The very first implication of this is that we can provide support for multiple Python versions, and we are releasing 3.7 through 3.10 support today. Originally with Pyston-full it was untenable to actively develop across four different major versions of Python due to the sheer amount of code changes that would need to be supported. But by focusing our efforts on just the interpreter, it’s now feasible to have a single implementation that contains the appropriate guards and ifdefs to target any of the Python versions we support.

Secondly, this means that we can no longer use many of our runtime optimizations. Fortunately, the non-interpreter changes we’ve made are generally less controversial than the interpreter changes, and we are working to submit them upstream back to CPython. Our first change has already been accepted and provides a few percent speedup. We’re having some unanticipated difficulty with the process of rebasing our other changes from 3.8 to 3.12, so this work is still in progress.

We think of the breakdown roughly as follows: of our roughly 30% original speedup, 10% is going into Pyston-lite, 10% was done independently by the CPython team between 3.8 and main, and the remaining 10% we are hoping to contribute back upstream.

In the longer-term future we are planning to submit our JIT upstream as well, but we expect retargeting it to 3.11 to be significantly more work than the other versions due to the extensive amount of changes that were made to the interpreter in that version.

Organizational changes

We’re very excited about these changes and the additional usage that we hope they unlock, but they also represent a fairly large reduction in scope for the Pyston project. This means that Marius and I (Kevin) are planning to gradually reduce our time investment in the project, and in particular we have made the decision to both leave Anaconda. We’ve been very happy with our original decision to join them, and I believe they have been happy as well, and the arrangement has ended on good terms. We’ve talked about ways we can continue to collaborate but no decisions have been made yet.

Final thoughts

Our goal is to get to the point that new versions of Python contain most of our speedups, and for those who are stuck or prefer to use older versions we offer as many speedups via Pyston-lite as possible. With the release of 3.7-3.10 support today we’re getting much closer to that goal.

Our aim is still to be the magical way of getting better Python performance, and we hope that our release today makes it even more of a no-brainer: just type a single command and get better performance.

As always if you have any questions or want to reach out for any reason, feel free to file an issue on our GitHub or chat with us in our Discord; we hope to hear from you!

Appendix: current benchmark numbers

These numbers are all versus a 3.8 baseline on a c6i.xlarge EC2 instance:

	pyperformance	macrobenchmarks
Pyston (full) 2.3.5	+65%	+28%
pyston-lite 2.3.5 on 3.8	+28%	+10%
CPython 3.11rc2	+26%	+12%

Announcing Pyston-lite: our Python JIT as an extension module

June 8, 2022June 8, 2022 Kevin Modzelewski

Today we’re very excited to announce Pyston-lite, a JIT for Python that is easily installable as an extension module. We’ve taken the core technology of Pyston and repackaged it so that you can install it through your existing Python package manager, making it dramatically easier to use. Pyston-lite doesn’t contain all of the optimizations of regular Pyston, but it is roughly 10-25% faster than stock Python 3.8 depending on the workload and we are not done optimizing it.

When we started Pyston v2 two years ago we originally built it as a PEP 523 extension module. We quickly, however, ran into optimizations that are prevented by that API, and we decided to fork the entire CPython codebase instead. We knew that ease-of-use was a primary factor in getting people to switch Python implementations so we made Pyston as easy to install as possible such as through a portable package and a conda package, but we kept noticing that there was still friction in the transition process which put off potential users. So we decided to do the one thing we could do to make the process even easier and offer an extension module again.

As a bonus, this is the first time that Mac users can use the Pyston JIT.

We’ve also released version v2.3.4 of full Pyston, which contains additional optimizations since v2.3.3. It is approximately 6% faster than v2.3.3 on pyperformance, for a total speedup of 66%. This release particularly improves the performance of Python floats and speeds up benchmarks like richards by 65% over v2.3.3.

The way you write your code can affect how well Pyston can optimize it; Kevin gave a talk at PyCon about this and the video is now available, where he gives some examples of how a smart optimizer introduces some new considerations for the programmer.

Using Pyston-lite

We think we’ve made it as easy as it can get now: to start using a JIT for Python 3.8 on Linux or Mac, simply do

pip install pyston_lite_autoload

conda install pyston_lite_autoload -c pyston -c conda-forge

This will install our JIT and configure the current environment to automatically use it until you uninstall the package. You don’t have to create a new environment or recompile any packages, and we are not aware of any compatibility issues or limitations in the code that can be run.

We offer two packages: pyston_lite, which contains our JIT, and pyston_lite_autoload which automatically injects our JIT into the Python process at startup. If you want programmatic control over enabling the JIT you can call pyston_lite.enable() without installing the autoload package, and if you install the autoload package you can disable the JIT injection by setting the DISABLE_PYSTON=1 environment variable.

Caveats

There are many things that we’re still hoping to add to Pyston-lite, but we wanted to launch an early version of it to get feedback and gauge interest. In particular, we’re planning on adding:

More optimizations. We’ve ported most of the Pyston optimizations to Pyston-lite, but there are still a few more that we can port with additional work.
Support for more Python versions. This current release only supports Python 3.8, but since Python-lite has a much smaller surface area we believe we can target multiple releases, unlike Pyston which will only target a single release for the foreseeable future.
Working with upstream CPython to add more JIT APIs. We currently are unable to use all of our techniques in Pyston-lite due to having less control over the system, but we are in discussions with the CPython team to add more JIT hooks to Python 3.12. Ideally we will be able to offer an extension module for 3.12 that has the same performance as a full fork of CPython.

A final caveat is that the performance is quite variable: we’ve seen very different performance results on different processors, with better improvements on a recent AMD processor than a moderately-old Intel processor. This is in addition to the workload-sensitivity that is inherent in a Python optimizer.

Final words

We hope you’ll try Pyston-lite out using the command above, because speeding up your Python code will not get any easier than this. If you have any trouble or questions you can find us on Discord and GitHub — we’d love to hear about your experience and which features you’d like us to prioritize.

Addendum: benchmarking

As more people are running Python benchmarks we wanted to make sure to publish our benchmarking methodology. The following numbers were generated on EC2 instances, either c6i.2xlarge or c6g.2xlarge, using an Ubuntu 20.04 AMI. The baseline was the Ubuntu-provided Python 3.8.10. The script used was essentially

git clone https://github.com/pyston/python-macrobenchmarks
cd python-macrobenchmarks
git checkout c7dbe453
bash compare.sh python3.8 pyston

We found that while EC2 instances exhibit performance drift over time, varying by +-1% over the course of a day, when benchmarks are run back-to-back the results are surprisingly consistent. We chose to use EC2 instances as representative of user environments, and for ease of reproduction.

We got the following results:

	pyperformance x86	ARM	Pyston macrobenchmarks x86	ARM
Pyston 2.3.4	+66%	+54%	+35%	+25%
Pyston-lite 2.3.4	+28%	+25%	+8%	+8%
Pyston-lite 2.3.4 Mac	+27%	+39%	+8%	+5%
CPython 3.11.0b3	+15%	+10%	+8%	+5%

Pyston v2.3.3: ARM support

April 1, 2022 Kevin Modzelewski

Today we’ve released Pyston v2.3.3, the latest version of our faster implementation of Python. The headline feature of this release is 64-bit ARM support, making Pyston available on ARM servers, M1 Macs via docker, Raspberry Pi’s that run a 64-bit operating system, and other 64-bit ARM systems. We also have a small number of performance and compatibility improvements.

Our speedup on ARM (30% on a Graviton EC2 instance) is comparable to our speedup for x86 (34% on an Intel i7-6700). Our warmup times are very low, so even low-powered processors such as the Raspberry Pi will benefit from Pyston. These numbers come from our macrobenchmark suite so we are hopeful that they are what you will achieve in practice.

You can download the latest release from our Github. We’ve provided a large number of ARM packages in our pyston conda channel, and the easiest way to use them is to download one of our PystonConda installers. pip is of course supported as well, we just don’t have the ability to provide pre-built packages for that format and you will have to compile from source, which typically is seamless but sometimes you need to install a Fortran or Rust compiler. We’ve also updated the pyston/pyston docker image for those who prefer that format.

We’d love to hear how these work for you! Feel free to file a bug report in our issue tracker or get in touch with us on our Discord server.

The Pyston Blog

Year: 2022