Atoti is a Python library and a JupyterLab extension to create data-viz widgets, such as pivot tables and charts in the notebook used to create the data model. In this article, we will step by step:
- Explain how to install the library
- Show how it changes the notebook workflow
- Start analyzing data with a small use case.
Note: This article has been tested against Atoti 0.3.1
Installation
To get started, install the atoti
Conda package and its companion packages, create a new Conda environment, activate it, and run this shell command:
conda install atoti jupyterlab jupyterlab-atoti nodejs openjdk python
Note: This adds the Atoti extension to JupyterLab which triggers a rebuild of JupyterLab’s front end assets so it can take several minutes to return.
Once this is done, you can start JupyterLab:
jupyter lab
Building and exploring the model
For the purpose of this guide, we’ll work on this popular Kaggle data set of trending YouTube video statistics. The goal will be to define some key metrics and create practical visualizations.
We start with some data prep with Pandas:
import pandas as pd videos_df = pd.read_csv( "USvideos.csv", usecols=[ "category_id", "channel_title", "title", "trending_date", "video_id", "views", ], ) # Parse trending date and split it into year/month/day columns. trending_date = pd.to_datetime( videos_df["trending_date"], format="%y.%d.%m" ) videos_df["trending_date"] = trending_date.dt.date videos_df["trending_year"] = trending_date.dt.year videos_df["trending_month"] = trending_date.dt.month videos_df["trending_day"] = trending_date.dt.day videos_df.sample(5)

import json from pathlib import Path # Parse JSON file holding mapping between a category ID and its title, and make a DataFrame out of it. category_data = json.loads(Path("US_category_id.json").read_text()) data = [ [int(item["id"]), item["snippet"]["title"]] for item in category_data["items"] ] categories_df = pd.DataFrame(data, columns=["id", "category_title"]) categories_df.head()

Now that the DataFrames are prepped, we can create the Atoti analytical cube:
import atoti as tt # An Atoti session is a bit like PySpark context session = tt.create_session() # Load the Pandas DataFrames into the Atoti session. # The API also supports loading CSV files, Parquet files, Spark DataFrames, and soon Arrow Tables. videos_store = session.read_pandas( videos_df, # These are the DataFrame's columns that make each row unique keys=["video_id", "trending_date"], store_name="videos" ) categories_store = session.read_pandas( categories_df, keys=["id"], store_name="categories" ) # Join the two stores together (this keeps the data normalized). videos_store.join(categories_store, mapping={"category_id": "id"}) cube = session.create_cube(videos_store, mode="manual")
In this case we choose the manual cube creation mode to shape the cube later. By default however, the cube structure is inferred from the types of the stores’ columns.
We also create analytical hierarchies – extra available axes in pivot tables or charts:
# A channel has multiple videos and each video can be renamed so it can have multiple titles. cube.hierarchies["video"] = [ videos_store["channel_title"], videos_store["video_id"], videos_store["title"] ] # The trending date can also be organized with multiple levels. cube.hierarchies["trending_date"] = [ videos_store["trending_year"], videos_store["trending_month"], videos_store["trending_day"], videos_store["trending_date"] ] # The category hierarchy has a single level: the category title. cube.hierarchies["category"] = [categories_store["category_title"]] cube

From there, we can create visualizations to get a sense of the data set. The visualize method on Cube instances outputs an interactive widget that can be built with mouse & keyboard inputs – no code needed.

A widget showing that there are almost always 200 trending videos per day

A widget showing the 10 channels with the most accumulated trending days

Drilling down the trending_date hierarchy while showing the numbers of trending videos per category
We’ve created these widgets without defining any specific metrics but one of the strengths of Atoti is for building a data model with aggregated indicators:
views_max = av.agg.max(videos_store["views"]) views_per_video = av.agg.single_value(views_max, on=["video_id"]) cube.measures["views"] = av.agg.sum(views_per_video)
Adding the views metric to cube.measures makes it directly available in the Atoti JupyterLab extension:

Drilling down on the video hierarchy to see the most viewed channels and their corresponding videos
Let’s define another metric that will give us the aggregated distinct count of trending videos:
cube.measures["trending_videos"] = av.agg.count_distinct( videos_store["video_id"], on=["video_id"] )

Sorting categories by amount of trending videos
Let’s make one more widget:

Plotting the amount of views Vs. amount of trending videos per channel
Sharing our insights
We can publish all the widgets we’ve built in JupyterLab in the Atoti dashboarding app:

Publishing a widget to the app and opening it there
Widgets published in the app can be added to dashboards with additional features such as quick filters and filtering on multi selection:

Filtering a dashboard in the app by category and then by channel
The dashboarding application is a “safe” environment: all the queries are read-only so there is no risk of breaking the model or tampering with its data.
You can share a link to your Atoti app to show it to other people.
If you would like to know more, head over to the documentation.