Logo

The Data Daily

Reduce Overhead and Get Straight to Work With Personal Compute in Databricks

Reduce Overhead and Get Straight to Work With Personal Compute in Databricks

The Databricks Lakehouse enables organizations to use a single, unified platform for all their data, analytics, and AI workloads. These projects often start with data copied onto laptops or personal virtual machines for rapid iteration, but when they reach sufficient maturity or need to move to production, practitioners often have to suffer through painful migrations to official infrastructure.

Today, we are thrilled to announce Personal Compute in Databricks, rolling out this week on AWS and Azure. Personal Compute provides users with a quick and simple path for developing from start to finish on Databricks while giving administrators the access and infrastructure controls they need to maintain peace of mind. With Personal Compute,

Combining Personal Compute with the Databricks Notebook for data-native development, the Workspace for managed file storage, and Repos for version control, Databricks offers a fully-hosted development experience in the Lakehouse that is as familiar as a user's laptop and that can seamlessly scale from small, everyday workloads to massive, big data workloads on large Spark clusters.

When granted access, users can create Personal Compute resources through either the Compute page or the Databricks Notebook [AWS, Azure]. These resources are single-machine all-purpose compute resources which are compatible with Unity Catalog, have CPUs and GPUs available, and use the latest version of the Databricks Runtime for Machine Learning (MLR).

The "Compute" page now includes a new shortcut button for creating a Personal Compute resource, and from the "Compute" page users can create a Personal Compute resource in two steps:

Users can also follow the traditional path of clicking "Create Cluster" at the top of the "Compute" page and choosing Personal Compute in the policy dropdown. Once their Personal Compute resource starts, it will be available for their use.

Users can also create a Personal Compute resource from a notebook in only three steps:

Users have the option to choose the resource's name, instance type, and runtime version as well as any other fields set as required in the policy by their administrator. Once their Personal Compute resource is running, their notebook will connect to it automatically. Just like any other cluster, their Personal Compute resource will be available for them to use across the work they do in Databricks, not just with the notebook through which it was created.

We are also super excited about how significantly Personal Compute simplifies compute access and management in Databricks. Today, most workflows in Databricks take users through some form of compute management, and this is largely overhead that is disconnected from the focus of users' work. It also adds to administrators' management burden by requiring them to monitor the compute resources created by their users to control costs. With Personal Compute, administrators have a direct path to give users the ability to create laptop-like compute with guardrails so users can focus on the job they are trying to get done.

Users with access to Personal Compute can create compute resources with the following properties:

Auto-termination [AWS, Azure, GCP] is also available for Personal Compute resources but disabled by default.

Workspace administrators can manage access to the Personal Compute policy on individual workspaces using the cluster policies UI [AWS, Azure], which enables the addition of individual users or groups to the policy's ACLs on that workspace.

Additionally, account administrators can enable or disable access to the Personal Compute policy for all users in their account using the Personal Compute account setting. [NOTE: This is only available on AWS initially. It will be added to Azure in the coming months.]

The default value for the Personal Compute setting is ON. During the initial rollout, account administrators will be able to adjust whether this switch is ON or not before the system begins to read the setting and use it to determine account-wide access to the policy.

The Personal Compute default policy can be customized by overriding certain properties [AWS, Azure]. Unlike traditional cluster policies, though, Personal Compute has the following properties fixed by Databricks:

To customize a workspace's Personal Compute policy, a workspace administrator can follow these steps:

Over the week of October 10, 2022, the Personal Compute default policy will be rolling out to all premium- and enterprise-tier workspaces on AWS and all premium-tier workspaces on Azure. On AWS, the account setting for account-wide access to Personal Compute will roll out simultaneously with default value ON, and the setting's value will take effect beginning November 16, 2022. We look forward to bringing Personal Compute to GCP in the coming months, and we will bring the same account-wide access control switch to all clouds in the same timeframe.

We're excited to see what our customers think of Personal Compute! If you're on AWS or Azure, please check it out and let us know what you think.

Images Powered by Shutterstock