Selecting data and serving a dashboard

This recitation was created by Nastya Oguienko and Justin Bois based on the lessons on dashboarding from previous versions of this course.

[1]:

import os
data_path = "../data/"

import numpy as np
import pandas as pd

import scipy.stats as st

import bokeh.io
import bokeh.plotting
import bokeh.models

import iqplot

bokeh.io.output_notebook()

serve = False
notebook_url = "localhost:8888"

Loading BokehJS ...

Important! Interactive control of graphics does not work in Google Colab. You have to run these notebooks in Jupyter Lab on your local machine!

Also, dashboards will not appear in the HTML-rendered version of this notebook. You are therefore encouraged to download and run this notebook on your local machine.

A plot of summary data

We will use the same beetle data set. Let’s load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.

[2]:

# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")

interpixel_distance = 0.08  # cm

# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance

# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28


def distance_traveled(df):
    """Compute distance traveled for a given beetle."""
    x_diff = df["x (cm)"].diff()
    y_diff = df["y (cm)"].diff()
    return np.cumsum(np.sqrt(x_diff**2 + y_diff**2))


df["distance traveled (cm)"] = (
    df.groupby(["ID", "bodypart"])
    .apply(distance_traveled)
    .reset_index(level=["ID", "bodypart"], drop=True)
)

# Take a look to remind ourselves
df.head()

[2]:

	frame	beetle_treatment	bodypart	x_coord	y_coord	likelihood	x (cm)	y (cm)	time (sec)	distance traveled (cm)
0	0	dalotia	head	73.086	193.835	1.0	5.84688	15.50680	0.000000	NaN
1	1	dalotia	head	73.730	194.385	1.0	5.89840	15.55080	0.035714	0.067752
2	2	dalotia	head	75.673	195.182	1.0	6.05384	15.61456	0.071429	0.235761
3	3	dalotia	head	77.319	196.582	1.0	6.18552	15.72656	0.107143	0.408629
4	4	dalotia	head	78.128	197.891	1.0	6.25024	15.83128	0.142857	0.531735

We may want to summarize the motion of the ants by the total distance traveled. Let’s compute that and store the result in a new data frame.

[3]:

df_dist = (
    df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
    .apply(lambda x: x.max())
    .reset_index()
)

# Take a look
df_dist

[3]:

	beetle_treatment	ID	bodypart	distance traveled (cm)
0	dalotia	0	abdomen	1256.637437
1	dalotia	0	antenna_left	2688.416512
2	dalotia	0	antenna_right	2800.528436
3	dalotia	0	head	1647.465193
4	dalotia	0	thorax	1266.693540
5	dalotia	1	abdomen	1143.534573
6	dalotia	1	antenna_left	2892.714768
7	dalotia	1	antenna_right	2856.248616
8	dalotia	1	head	1575.459175
9	dalotia	1	thorax	1135.457988
10	dalotia	2	abdomen	1068.647668
11	dalotia	2	antenna_left	3427.051189
12	dalotia	2	antenna_right	3810.080622
13	dalotia	2	head	1870.329342
14	dalotia	2	thorax	1289.644406
15	dalotia	3	abdomen	2169.129372
16	dalotia	3	antenna_left	4687.207298
17	dalotia	3	antenna_right	5569.482037
18	dalotia	3	head	3383.078720
19	dalotia	3	thorax	2435.800519
20	dalotia	4	abdomen	1879.510454
21	dalotia	4	antenna_left	3449.479980
22	dalotia	4	antenna_right	3330.416362
23	dalotia	4	head	2059.613435
24	dalotia	4	thorax	1642.574170
25	dalotia	5	abdomen	1383.706414
26	dalotia	5	antenna_left	2677.861333
27	dalotia	5	antenna_right	2561.370168
28	dalotia	5	head	1735.869976
29	dalotia	5	thorax	1326.177297
30	sceptobius	6	abdomen	912.727949
31	sceptobius	6	antenna_left	2657.582883
32	sceptobius	6	antenna_right	2287.451179
33	sceptobius	6	head	1205.616500
34	sceptobius	6	thorax	588.067617
35	sceptobius	7	abdomen	339.701993
36	sceptobius	7	antenna_left	1531.338615
37	sceptobius	7	antenna_right	2389.643450
38	sceptobius	7	head	420.652691
39	sceptobius	7	thorax	238.159884
40	sceptobius	8	abdomen	500.156206
41	sceptobius	8	antenna_left	2853.945585
42	sceptobius	8	antenna_right	2777.918093
43	sceptobius	8	head	1085.719023
44	sceptobius	8	thorax	703.824390
45	sceptobius	9	abdomen	357.735190
46	sceptobius	9	antenna_left	2382.851423
47	sceptobius	9	antenna_right	2488.580833
48	sceptobius	9	head	887.401463
49	sceptobius	9	thorax	546.723268
50	sceptobius	10	abdomen	661.166480
51	sceptobius	10	antenna_left	2693.747130
52	sceptobius	10	antenna_right	2614.627036
53	sceptobius	10	head	1181.083980
54	sceptobius	10	thorax	826.188143
55	sceptobius	11	abdomen	504.217197
56	sceptobius	11	antenna_left	2003.843440
57	sceptobius	11	antenna_right	2048.341855
58	sceptobius	11	head	726.746776
59	sceptobius	11	thorax	514.148107

To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.

[4]:

def strip_plot(df_dist):
    strip = iqplot.strip(
        df_dist.loc[df_dist["bodypart"] == "thorax", :],
        q="distance traveled (cm)",
        cats="beetle_treatment",
        q_axis="x",
        palette=["#7570b3", "#1b9e77"],
        frame_height=100,
        frame_width=350,
        tools="pan,box_zoom,wheel_zoom,reset,tap,save",
        tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
        toolbar_location="above",
    )

    # Always start at zero
    strip.x_range.start = 0

    return strip

Let’s see if it works….

[5]:

bokeh.io.show(strip_plot(df_dist))

This summary plot exposes, for example, that ant 3 is highly active (you can see it’s ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.

Adding more interactions and more plots to the dashboard

Let’s add everything we planned to our dashboard. This includes more options to select data for visualization and more plots: a plot of the x and y position vs time (plotting x and y as a separate path), plot of distance traveled and a summary plot.

We have to think about widgets we want to add. We know our data includes multiple ants, multiple body parts, and multiple beetle treatments. Rather than making a new plot for each possible combination, we can add multiple interactive elements to our dashboard. We will add drop-down lists to choose the beetle treatment, ant ID, and body part to track.

Notice that the possible ant IDs change between beetle treatments. For the Dalotia beetle, we have ant IDs 0–5. For Sceptobius, we have 6–11. So, our Ant ID drop-down list must change when we change the beetle treatment. To do this, we add the helper function update_ant_ID_selector which updates the options in the ant_ID_selector when the beetle treatment is changed. Notice that this is a callback function which ensures that the ant ID values it returns depend on the beetle_selector drop-down list.

We will use the same principles for the code as in the previous part of this lesson. First, we will define functions that will create ColumnDataSource object for the plots. Next, we will define plotting functions with the cds argument that will return separate plot objects. We will create multiple widgets: time slider, drop-down lists for the beetle, ant ID and bodypart. We will write callback functions and on_change() methods for the sliders to link those elements together and to the ColumnDataSource object. Again, notice, that this time we have additional callback function to link the ant_ID_selector to the beetle_selector. Get ready for a large code cell!

[6]:

# data preparation function
def extract_sub_df(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Extract sub data frame for body part of
    one ant over a time range."""
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= time_range[0])
        & (df["time (sec)"] <= time_range[1])
    )

    return df.loc[inds, :]


def build_cds(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Builds a ColumnDataSource object from the part of a given data frame,
    based on given ant_ID, bodypart and time range."""
    ant_ID = int(ant_ID)
    cds = bokeh.models.ColumnDataSource(
        extract_sub_df(df, ant_ID, bodypart, time_range)
    )

    return cds


# plotting functions
def plot_xy(cds):
    """Plot the x and y positions of a beetle over time."""
    p_xy = bokeh.plotting.figure(
        width=550,
        frame_height=100,
        x_axis_label="position (cm)",
        y_axis_label="time (sec)",
        x_range=[0, np.max(df["time (sec)"])],
        y_range=[0, 20],
        toolbar_location="above"
    )

    p_xy.add_layout(bokeh.models.Legend(), "right")

    p_xy.legend.click_policy="hide"

    p_xy.line(
        x="time (sec)",
        y="x (cm)",
        source=cds,
        color="#ef8a62",
        legend_label="x",
        line_width=2,
    )

    p_xy.line(
        x="time (sec)",
        y="y (cm)",
        source=cds,
        color="#67a9cf",
        legend_label="y",
        line_width=2,
    )

    return p_xy


def plot_distance_traveled(cds):
    """Make a plot of distance traveled."""
    p_dt = bokeh.plotting.figure(
        width=300,
        height=250,
        x_axis_label="time (sec)",
        y_axis_label="distance traveled (cm)",
    )

    p_dt.line(
        x="time (sec)",
        y="distance traveled (cm)",
        source=cds,
        color="#ef8a62",
        line_width=2,
    )

    return p_dt


def traj_plot(cds):
    """Make a plot of the trajectory in x-y plane."""
    p = bokeh.plotting.figure(
        width=350,
        height=250,
        x_axis_label="x (cm)",
        y_axis_label="y (cm)",
        x_range=[0, 20],
        y_range=[0, 20],
    )

    color_mapper = bokeh.models.LinearColorMapper(
        palette="Viridis256",
        low=0,
        high=np.max(df["time (sec)"]),
    )
    color_bar = bokeh.models.ColorBar(
        color_mapper=color_mapper, title="time (sec)", width=10
    )

    p.circle(
        x="x (cm)",
        y="y (cm)",
        source=cds,
        size=2,
        color={"field": "time (sec)", "transform": color_mapper},
    )

    p.add_layout(color_bar, "right")

    return p


# creating widgets

# Create beetle treatment selector drop-down list
beetle_selector = bokeh.models.Select(
    title="beetle treatment",
    options=sorted(list(df["beetle_treatment"].unique())),
    value="dalotia",
    width=300,
)

# Create bodypart selector drop-down list
bodypart_selector = bokeh.models.Select(
    title="body part",
    options=sorted(list(df["bodypart"].unique())),
    value="thorax",
    width=120,
)

# Create ant ID selector drop-down list
ant_ID_selector = bokeh.models.Select(
    title="Ant ID",
    options=sorted(
        [
            str(i)
            for i in df.loc[
                df["beetle_treatment"] == df["beetle_treatment"].unique()[0], "ID"
            ].unique()
        ]
    ),
    width=120,
    value="0",
)

# Create a slider for time
time_interval_slider_throttled = bokeh.models.RangeSlider(
    title="time (sec)",
    start=df["time (sec)"].min(),
    end=df["time (sec)"].max(),
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    width=300,
    value_throttled=(df["time (sec)"].min(), df["time (sec)"].max()),
)


# callback functions

# here is an additional callback function which links beetle_selector to ant_ID_selector
# notice, that it only updates the ant_ID_selector and does not affect cds
def update_ant_ID_selector(attr, old, new):
    inds = df["beetle_treatment"] == beetle_selector.value
    options = [str(i) for i in sorted(df.loc[inds, "ID"].unique())]
    ant_ID_selector.options = options
    ant_ID_selector.value = options[0]


def callback(attr, old, new):
    # Slider values
    time_range = time_interval_slider_throttled.value_throttled

    # Body part selector
    bodypart = bodypart_selector.value

    # Ant ID selector
    ant_ID = ant_ID_selector.value

    # Renewing data in cds
    new_cds = build_cds(df, ant_ID, bodypart, (time_range[0], time_range[1]))

    cds.data.update(new_cds.data)

We have defined the functions, let’s now build the initial ColumnDataSource object:

[7]:

ant_ID = ant_ID_selector.value
bodypart = bodypart_selector.value
beetle = beetle_selector.value

cds = build_cds(df, ant_ID, bodypart)

We have made and connected almost of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.

Our task now is to add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, let’s write a proper callback function (the third callback function in our code) to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p, its ColumnDataSource is p.renderers[i].data_source, where i is the index of the set of glyphs we are considering. For strip plots generated by iqplot, there is only a single data source, so i is always 0.

[8]:

def update_strip(attr, old, new):
    # Update data source
    bodypart = bodypart_selector.value

    strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
        df_dist["bodypart"] == bodypart, "distance traveled (cm)"
    ].values

Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr, old, and new, which refer to the index of the data point that is collected. I have also written a small deselect_ant() callback function which removes selection from the summary plot when we change the beetle in the drop-down list.

[9]:

def select_ant(attr, old, new):
    """Update widgets for selection on strip plot."""
    # Extract data source
    source = strip.renderers[0].data_source

    # Use try block in case no data are selected (then pass)
    try:
        # Get index of selected glyph
        ind = new[0]

        # Set widget values
        beetle_selector.value = source.data["beetle_treatment"][ind]
        ant_ID_selector.value = str(source.data["ID"][ind])
    except:
        pass

Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change() method of a ColumnDataSource.

[10]:

strip = strip_plot(df_dist)
strip.renderers[0].data_source.selected.on_change("indices", select_ant)

# writing on_change for all other widgets
beetle_selector.on_change("value", update_ant_ID_selector)
bodypart_selector.on_change("value", callback, update_strip)
ant_ID_selector.on_change("value", callback)
time_interval_slider_throttled.on_change("value_throttled", callback)

All the pieces are now in place! Let’s lay it out!

[11]:

widgets = bokeh.layouts.column(
    time_interval_slider_throttled,
    bokeh.models.Spacer(height=20),
    beetle_selector,
    bokeh.models.Spacer(height=20),
    bokeh.layouts.row(
        ant_ID_selector,
        bokeh.models.Spacer(width=15),
        bodypart_selector,
    ),
)

row1 = bokeh.layouts.row(
    traj_plot(cds),
    bokeh.models.Spacer(width=20),
    plot_distance_traveled(cds),
    bokeh.models.Spacer(width=20),
    widgets
)
row2 = bokeh.layouts.row(plot_xy(cds), bokeh.models.Spacer(width=30), strip)

dashboard = bokeh.layouts.column(row1, bokeh.models.Spacer(height=20), row2)


def app(doc):
    doc.add_root(dashboard)

Before we plot our awesome dashboard, let’s talk about one more cool feature of Bokeh.

Deploying a dashboard on a stand-alone browser tab

Bokeh lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you have to do is to use bokeh.io.curdoc().add_root() command instead of usual bokeh.io.show(app). You can then serve the dashboard by entering bokeh serve --show name_of_notebook.ipynb on the command line (you have to be in the folder which contains the notebook to do that, or you can use the path to the notebook if you want to serve it from a different place on the machine). Try that yourself; download this notebook, in the first cell set serve = True.

bokeh serve --show name_of_notebook.ipynb

[13]:

if serve:
    bokeh.io.curdoc().add_root(dashboard)
else:
    bokeh.io.show(app, notebook_url=notebook_url)

Computing environment

[13]:

%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,iqplot,jupyterlab

Python implementation: CPython
Python version       : 3.11.4
IPython version      : 8.12.0

numpy     : 1.24.3
scipy     : 1.10.1
pandas    : 1.5.3
bokeh     : 3.2.1
iqplot    : 0.3.3
jupyterlab: 4.0.4