Tue May 11 2021

Ritchie Vink, Machine Learning Engineer, writes Polars, one of the fastest DataFrame libraries in Python and Rust

Triggered by the lack of mature DataFrame libraries in programming language Rust, machine learning engineer Ritchie Vink created Polars, one of the fastest DataFrame libraries in Rust, which is also compatible with high-level programming language Python.  

Ritchie, who works at the Amsterdam-based AI consultancy Xomnia, began this project in May 2020 to address this gap in the Rust ecosystem.

“It all started as a pet project that I thought should be good enough for the needs I had for a specific assignment” says Ritchie. “As things developed and took a more serious direction, however, I started putting more effort into making the project production-ready”.

The Netherlands-based machine learning engineer named the DataFrame library that he created Polars, which is considered one of the fastest DataFrame libraries in Rust and Python. Since then, Polars has been drawing increasing attention from some of the thought leaders in the data industry. The DataFrame library has a dedicated website, which can be accessed by clicking here.

What makes the Polars DataFrame Library unique?

“Polars is a multi-threaded DataFrame library, meaning that it allows using all the cores of a computer at the same time to achieve its full processing potential,” explains Ritchie. This makes Polars unique compared to some of the most widely-used DataFrame libraries in Python, such as Pandas, which uses only one CPU core, while other cores remain idle.

“Even though there are libraries in Python that allow concurrency with multi-processing, they require copying data between processes, which makes them inefficient ” he explains. Ritchie has thoroughly explained the technical novelties of Polars, going over its memory model, which is based on Apache Arrow’s memory model.

“It might not seem like much to speed up something that is already in the millisecond realm, but we are very excited about the possibilities brought by the development of Polars,” comments Tim Paauw, CTO at Xomnia, which is proudly sponsoring Polars.

“Speed and memory efficiency matter when you consider that these DataFrame techniques are very widely used. It enables processing of larger datasets without the need for all kinds of extra infrastructure. Such improvements have a big impact on project agility and sustainable computing, among others,” adds Tim.

What’s next for Polars?

At the moment, Polars has a public API in Python and Rust, and a JavaScript API is under development. It can speed up any application that uses DataFrames, like API's or ETL procedures, but also other analysis applications or decision support systems.

“I hope that Polars becomes more widely used in the future, and that it incorporates more and more front-end languages,” adds Ritchie, who is dedicated to continuously enhancing the architecture of Polars.

“Xomnia firmly believes in the application of open source software; we have been using that to help many of our clients over the years,” says Ollie Dapper, co-founder of Xomnia. “By powering the development of Polars, we aim to give back to the open source community and support the  passions of our colleagues, which is something that we highly value at Xomnia.”

About Ritchie Vink

Ritchie has over 6 years of experience within the field of machine learning engineering. Since 2017, he has been working as a machine learning engineer at Xomnia. He is also a blogger, interested in exploring in depth topics as Bayesian statistics, neural networks and algorithms.