Logo

The Data Daily

Visualize BigQuery data in Jupyter notebooks  |  Google Cloud

Visualize BigQuery data in Jupyter notebooks  |  Google Cloud

A notebook provides an environment in which to author and execute code. A notebook is essentially a source artifact, saved as an IPYNB file. It can contain descriptive text content, executable code blocks, and output rendered as interactive HTML.

Structurally, a notebook is a sequence of cells. A cell is a block of input text that is evaluated to produce results. Cells can be of three types:

The following image shows a Markdown cell that's followed by a Python code cell, and then followed by the output:

Each opened notebook is associated with a running session (also known as a kernel in Python). This session executes all the code in the notebook, and it manages the state. The state includes the variables with their values, functions and classes, and any existing Python modules that you load.

In Google Cloud, you can use a Vertex AI Workbench notebook-based environment to query and explore data, develop and train a model, and run your code as part of a pipeline. In this tutorial, you create a managed notebook instance on Vertex AI Workbench and then explore BigQuery data within the JupyterLab interface.

In this section, you set up a JupyterLab instance on Google Cloud so that you can to create managed notebooks.

In this section, you open JupyterLab and explore the BigQuery resources that are available in a managed notebooks instance.

In this section, you write SQL directly in notebook cells and read data from BigQuery into the Python notebook.

Magic commands that use a single or double percentage character ( or ) let you use minimal syntax to interact with BigQuery within the notebook. The BigQuery client library for Python is automatically installed in a managed notebook. Behind the scenes, the magic command uses the BigQuery client library for Python to run the given query, convert the results to a pandas DataFrame, optionally save the results to a variable, and then display the results.

Note: As of version 1.26.0 of the Python package, the BigQuery Storage API is used by default to download results from the magics.

To get the number of regions by country in the dataset, enter the following statement: The output is similar to the following:
Query complete after 0.07s: 100%|██████████| 4/4 [00:00

Images Powered by Shutterstock