RAPIDS 0.14 Release: S.C.A.L.I.N.G

Read original article here

The small things add up. It was time for some organization and cleanup, so the RAPIDS 0.14 Release focuses on many quality of life improvements. In 0.14, we concentrated on the details that generally get put on the back burner during major releases — improving documentation, pushing down bug counts, removing legacy code, adding more performance and regression testing in CI/CD, solving issues, and adding a few new features in cuML and cuGraph. That said, we still managed time to pull off a few amazing achievements as well.

Before continuing, let me first say, like everyone else in the community, we are all deeply impacted by the global struggle to stay safe, healthy, and live together equally. Join us and our community in whatever capacity you can — donate, volunteer, protest, tweet, code, educate. The time is now. As part of the open-source community, we are held accountable to speak up and do what’s right.

I am excited to let the world know about something the amazing RAPIDS team has been working on for months: a proper benchmark to show how GPU-driven data science and analytics can improve your life. When I started in “big data,” there were many different vendors and solutions, and it wasn’t always easy to compare them. TPC introduced a series of queries that simulate common business workflow issues covering things as simple as large joins or as complex as NLP with sentiment detection. I want to be clear that these initial results are unofficial. That said, they are very exciting, and we’ve open-sourced all our work.

The benchmark is broken down by “scale factors” and what I’d like to highlight here is SF10,000 (SF1K), or 10 Terabyte scale. The average speed-up vs. the current leading CPU solution is more than 12x on the Volta architecture. At the smaller size, SF1000 (SF1K or 1 terabyte), the results are even more striking: speed-ups of more than 350x. When you can do more, you do more. (For additional details, see the on-demand presentation Scaling Up And Out in Python: Optimizing 1TB+ Datasets on Distributed GPU Systems with Dask and RAPIDS from AnacondaCon 2020 presented by RAPIDS engineers Ben Zaitlen and Nick Becker). But using the new A100, what I am about to tell you is amazing.

I’m sure you were as excited as I was when NVIDIA announced the A100. I’m even more excited to tell you what we’ve been able to accomplish with 128 A100 across 16x DGX A100. At SF10K, the average speed-up is ~19.5x vs. the leading CPU solution, and at SF1K, we saw speed-ups of more than 433x! Plus, we are just getting started! Read more about the results and engineering behind the benchmark in the blog from Nick Becker and Paul Mahler here. Again, I want to thank the open-source community for all they’ve done to allow us to do so much so quickly.

Exactly as it sounds — our goal is to make RAPIDS as usable and performant as possible wherever data science is done. We will continue to work with more open source projects to democratize acceleration and efficiency in data science further. Within the 0.14 release work, many new additions and extensions to the RAPIDS ecosystem are important to showcase.

The RAPIDS team works closely with major cloud providers and open source hyperparameter optimization (HPO) solutions (such as Ray) to provide code samples so you can get started with HPO in minutes on the cloud of your choice. See the example code here.

Working with Plotly, DataShader, and the RAPIDS viz stack, we added a new demo to explore gigabyte datasets within Plotly Dash, all accelerated by RAPIDS and GPUs. Read about the collaboration in Plotly’s blog, how we built the demo in our RAPIDS Medium blog, and check out the RAPIDS Plotly community webpage for more information on how to get the code and build demos like this yourself.

Our colleagues at NVIDIA releasednvTabular — a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte-scale datasets used to train deep learning-based recommender systems. It provides a high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS cuDF library. See how it works here. Speaking of recommenders, the NVIDIA RAPIDS.ai team also placedfirst on the most recent ACM Recsys Challenge using RAPIDS! Congrats to them, and we look forward to their paper on the winning solution.

There are a few more updates from the community to note. BlazingSQL now supports out-of-core query execution, which enables queries to operate on datasets dramatically larger than available GPU memory. And UCX-Py adds InfiniBand support as well as Multi-Node Multi-GPU NVLink and InfiniBand tests.

One of the primary focuses of this release has been improving the experience for end-users of RAPIDS. To that end, we have focused on resolving open issues, paying down tech debt, and improving documentation across the board.

We think the improvements will make it easier for anyone to pick up and be effective with RAPIDS — we hope you agree! As always, documentation can be found at https://docs.rapids.ai/.

Because we are moving fast (and the whole point of RAPIDS is to provide blistering speed), we want to ensure we do not introduce performance regressions into our codebase. To that end, we have strengthened our internal CI/CD systems to include additional performance and regression testing. Many thanks to our ops team for pushing this through!

Another significant change in this release relates to notebooks-contrib, which houses community notebooks. Many of these notebooks are no longer actively used by the community, and it requires quite a bit of focus from RAPIDS engineering to maintain these. In the spirit of focusing on fewer, better quality notebooks, we are planning to deprecate official support for these notebooks. Fear not, they are not going anywhere — they will be moved into a separate repo where users can still view, maintain, and fork them if needed. The only change is that the RAPIDS engineering team will no longer be focused on keeping these up to date. Keep an eye out for more details on this soon.

Here’s a quick summary of what we accomplished in this release:

In 0.14 RAPIDS, we refined docs, continued to work with the community on integration, pushed down bug count, expanded C++ examples, and added more tests in CI/CD. In 0.15, we will add more cuStreamz functionality, cyBERT 2.0, and more. We will focus on stability at scale, hardening features, and preparing for 1.0.

As always, we want to thank all of you for using RAPIDS and contributing to our ever-growing GPU-accelerated ecosystem. Please check out our latest release deck or join in on the conversation on Slack or GitHub.

Images Powered by Shutterstock

The Data Daily

RAPIDS 0.14 Release: S.C.A.L.I.N.G