Selecting data and serving a dashboard

This recitation was created by Nastya Oguienko and Justin Bois based on the lessons on dashboarding from previous versions of this course.


[1]:
import os
data_path = "../data/"

import numpy as np
import pandas as pd

import scipy.stats as st

import bokeh.io
import bokeh.plotting
import bokeh.models

import iqplot

bokeh.io.output_notebook()

serve = False
notebook_url = "localhost:8888"
Loading BokehJS ...

Important! Interactive control of graphics does not work in Google Colab. You have to run these notebooks in Jupyter Lab on your local machine!

Also, dashboards will not appear in the HTML-rendered version of this notebook. You are therefore encouraged to download and run this notebook on your local machine.

A plot of summary data

We will use the same beetle data set. Let’s load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.

[2]:
# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")

interpixel_distance = 0.08  # cm

# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance

# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28


def distance_traveled(df):
    """Compute distance traveled for a given beetle."""
    x_diff = df["x (cm)"].diff()
    y_diff = df["y (cm)"].diff()
    return np.cumsum(np.sqrt(x_diff**2 + y_diff**2))


df["distance traveled (cm)"] = (
    df.groupby(["ID", "bodypart"])
    .apply(distance_traveled)
    .reset_index(level=["ID", "bodypart"], drop=True)
)

# Take a look to remind ourselves
df.head()
[2]:
frame beetle_treatment ID bodypart x_coord y_coord likelihood x (cm) y (cm) time (sec) distance traveled (cm)
0 0 dalotia 0 head 73.086 193.835 1.0 5.84688 15.50680 0.000000 NaN
1 1 dalotia 0 head 73.730 194.385 1.0 5.89840 15.55080 0.035714 0.067752
2 2 dalotia 0 head 75.673 195.182 1.0 6.05384 15.61456 0.071429 0.235761
3 3 dalotia 0 head 77.319 196.582 1.0 6.18552 15.72656 0.107143 0.408629
4 4 dalotia 0 head 78.128 197.891 1.0 6.25024 15.83128 0.142857 0.531735

We may want to summarize the motion of the ants by the total distance traveled. Let’s compute that and store the result in a new data frame.

[3]:
df_dist = (
    df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
    .apply(lambda x: x.max())
    .reset_index()
)

# Take a look
df_dist
[3]:
beetle_treatment ID bodypart distance traveled (cm)
0 dalotia 0 abdomen 1256.637437
1 dalotia 0 antenna_left 2688.416512
2 dalotia 0 antenna_right 2800.528436
3 dalotia 0 head 1647.465193
4 dalotia 0 thorax 1266.693540
5 dalotia 1 abdomen 1143.534573
6 dalotia 1 antenna_left 2892.714768
7 dalotia 1 antenna_right 2856.248616
8 dalotia 1 head 1575.459175
9 dalotia 1 thorax 1135.457988
10 dalotia 2 abdomen 1068.647668
11 dalotia 2 antenna_left 3427.051189
12 dalotia 2 antenna_right 3810.080622
13 dalotia 2 head 1870.329342
14 dalotia 2 thorax 1289.644406
15 dalotia 3 abdomen 2169.129372
16 dalotia 3 antenna_left 4687.207298
17 dalotia 3 antenna_right 5569.482037
18 dalotia 3 head 3383.078720
19 dalotia 3 thorax 2435.800519
20 dalotia 4 abdomen 1879.510454
21 dalotia 4 antenna_left 3449.479980
22 dalotia 4 antenna_right 3330.416362
23 dalotia 4 head 2059.613435
24 dalotia 4 thorax 1642.574170
25 dalotia 5 abdomen 1383.706414
26 dalotia 5 antenna_left 2677.861333
27 dalotia 5 antenna_right 2561.370168
28 dalotia 5 head 1735.869976
29 dalotia 5 thorax 1326.177297
30 sceptobius 6 abdomen 912.727949
31 sceptobius 6 antenna_left 2657.582883
32 sceptobius 6 antenna_right 2287.451179
33 sceptobius 6 head 1205.616500
34 sceptobius 6 thorax 588.067617
35 sceptobius 7 abdomen 339.701993
36 sceptobius 7 antenna_left 1531.338615
37 sceptobius 7 antenna_right 2389.643450
38 sceptobius 7 head 420.652691
39 sceptobius 7 thorax 238.159884
40 sceptobius 8 abdomen 500.156206
41 sceptobius 8 antenna_left 2853.945585
42 sceptobius 8 antenna_right 2777.918093
43 sceptobius 8 head 1085.719023
44 sceptobius 8 thorax 703.824390
45 sceptobius 9 abdomen 357.735190
46 sceptobius 9 antenna_left 2382.851423
47 sceptobius 9 antenna_right 2488.580833
48 sceptobius 9 head 887.401463
49 sceptobius 9 thorax 546.723268
50 sceptobius 10 abdomen 661.166480
51 sceptobius 10 antenna_left 2693.747130
52 sceptobius 10 antenna_right 2614.627036
53 sceptobius 10 head 1181.083980
54 sceptobius 10 thorax 826.188143
55 sceptobius 11 abdomen 504.217197
56 sceptobius 11 antenna_left 2003.843440
57 sceptobius 11 antenna_right 2048.341855
58 sceptobius 11 head 726.746776
59 sceptobius 11 thorax 514.148107

To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.

[4]:
def strip_plot(df_dist):
    strip = iqplot.strip(
        df_dist.loc[df_dist["bodypart"] == "thorax", :],
        q="distance traveled (cm)",
        cats="beetle_treatment",
        q_axis="x",
        palette=["#7570b3", "#1b9e77"],
        frame_height=100,
        frame_width=350,
        tools="pan,box_zoom,wheel_zoom,reset,tap,save",
        tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
        toolbar_location="above",
    )

    # Always start at zero
    strip.x_range.start = 0

    return strip

Let’s see if it works….

[5]:
bokeh.io.show(strip_plot(df_dist))

This summary plot exposes, for example, that ant 3 is highly active (you can see it’s ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.

Adding more interactions and more plots to the dashboard

Let’s add everything we planned to our dashboard. This includes more options to select data for visualization and more plots: a plot of the x and y position vs time (plotting x and y as a separate path), plot of distance traveled and a summary plot.

We have to think about widgets we want to add. We know our data includes multiple ants, multiple body parts, and multiple beetle treatments. Rather than making a new plot for each possible combination, we can add multiple interactive elements to our dashboard. We will add drop-down lists to choose the beetle treatment, ant ID, and body part to track.

Notice that the possible ant IDs change between beetle treatments. For the Dalotia beetle, we have ant IDs 0–5. For Sceptobius, we have 6–11. So, our Ant ID drop-down list must change when we change the beetle treatment. To do this, we add the helper function update_ant_ID_selector which updates the options in the ant_ID_selector when the beetle treatment is changed. Notice that this is a callback function which ensures that the ant ID values it returns depend on the beetle_selector drop-down list.

We will use the same principles for the code as in the previous part of this lesson. First, we will define functions that will create ColumnDataSource object for the plots. Next, we will define plotting functions with the cds argument that will return separate plot objects. We will create multiple widgets: time slider, drop-down lists for the beetle, ant ID and bodypart. We will write callback functions and on_change() methods for the sliders to link those elements together and to the ColumnDataSource object. Again, notice, that this time we have additional callback function to link the ant_ID_selector to the beetle_selector. Get ready for a large code cell!

[6]:
# data preparation function
def extract_sub_df(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Extract sub data frame for body part of
    one ant over a time range."""
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= time_range[0])
        & (df["time (sec)"] <= time_range[1])
    )

    return df.loc[inds, :]


def build_cds(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Builds a ColumnDataSource object from the part of a given data frame,
    based on given ant_ID, bodypart and time range."""
    ant_ID = int(ant_ID)
    cds = bokeh.models.ColumnDataSource(
        extract_sub_df(df, ant_ID, bodypart, time_range)
    )

    return cds


# plotting functions
def plot_xy(cds):
    """Plot the x and y positions of a beetle over time."""
    p_xy = bokeh.plotting.figure(
        width=550,
        frame_height=100,
        x_axis_label="position (cm)",
        y_axis_label="time (sec)",
        x_range=[0, np.max(df["time (sec)"])],
        y_range=[0, 20],
        toolbar_location="above"
    )

    p_xy.add_layout(bokeh.models.Legend(), "right")

    p_xy.legend.click_policy="hide"

    p_xy.line(
        x="time (sec)",
        y="x (cm)",
        source=cds,
        color="#ef8a62",
        legend_label="x",
        line_width=2,
    )

    p_xy.line(
        x="time (sec)",
        y="y (cm)",
        source=cds,
        color="#67a9cf",
        legend_label="y",
        line_width=2,
    )

    return p_xy


def plot_distance_traveled(cds):
    """Make a plot of distance traveled."""
    p_dt = bokeh.plotting.figure(
        width=300,
        height=250,
        x_axis_label="time (sec)",
        y_axis_label="distance traveled (cm)",
    )

    p_dt.line(
        x="time (sec)",
        y="distance traveled (cm)",
        source=cds,
        color="#ef8a62",
        line_width=2,
    )

    return p_dt


def traj_plot(cds):
    """Make a plot of the trajectory in x-y plane."""
    p = bokeh.plotting.figure(
        width=350,
        height=250,
        x_axis_label="x (cm)",
        y_axis_label="y (cm)",
        x_range=[0, 20],
        y_range=[0, 20],
    )

    color_mapper = bokeh.models.LinearColorMapper(
        palette="Viridis256",
        low=0,
        high=np.max(df["time (sec)"]),
    )
    color_bar = bokeh.models.ColorBar(
        color_mapper=color_mapper, title="time (sec)", width=10
    )

    p.circle(
        x="x (cm)",
        y="y (cm)",
        source=cds,
        size=2,
        color={"field": "time (sec)", "transform": color_mapper},
    )

    p.add_layout(color_bar, "right")

    return p


# creating widgets

# Create beetle treatment selector drop-down list
beetle_selector = bokeh.models.Select(
    title="beetle treatment",
    options=sorted(list(df["beetle_treatment"].unique())),
    value="dalotia",
    width=300,
)

# Create bodypart selector drop-down list
bodypart_selector = bokeh.models.Select(
    title="body part",
    options=sorted(list(df["bodypart"].unique())),
    value="thorax",
    width=120,
)

# Create ant ID selector drop-down list
ant_ID_selector = bokeh.models.Select(
    title="Ant ID",
    options=sorted(
        [
            str(i)
            for i in df.loc[
                df["beetle_treatment"] == df["beetle_treatment"].unique()[0], "ID"
            ].unique()
        ]
    ),
    width=120,
    value="0",
)

# Create a slider for time
time_interval_slider_throttled = bokeh.models.RangeSlider(
    title="time (sec)",
    start=df["time (sec)"].min(),
    end=df["time (sec)"].max(),
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    width=300,
    value_throttled=(df["time (sec)"].min(), df["time (sec)"].max()),
)


# callback functions

# here is an additional callback function which links beetle_selector to ant_ID_selector
# notice, that it only updates the ant_ID_selector and does not affect cds
def update_ant_ID_selector(attr, old, new):
    inds = df["beetle_treatment"] == beetle_selector.value
    options = [str(i) for i in sorted(df.loc[inds, "ID"].unique())]
    ant_ID_selector.options = options
    ant_ID_selector.value = options[0]


def callback(attr, old, new):
    # Slider values
    time_range = time_interval_slider_throttled.value_throttled

    # Body part selector
    bodypart = bodypart_selector.value

    # Ant ID selector
    ant_ID = ant_ID_selector.value

    # Renewing data in cds
    new_cds = build_cds(df, ant_ID, bodypart, (time_range[0], time_range[1]))

    cds.data.update(new_cds.data)

We have defined the functions, let’s now build the initial ColumnDataSource object:

[7]:
ant_ID = ant_ID_selector.value
bodypart = bodypart_selector.value
beetle = beetle_selector.value

cds = build_cds(df, ant_ID, bodypart)

We have made and connected almost of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.

Our task now is to add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, let’s write a proper callback function (the third callback function in our code) to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p, its ColumnDataSource is p.renderers[i].data_source, where i is the index of the set of glyphs we are considering. For strip plots generated by iqplot, there is only a single data source, so i is always 0.

[8]:
def update_strip(attr, old, new):
    # Update data source
    bodypart = bodypart_selector.value

    strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
        df_dist["bodypart"] == bodypart, "distance traveled (cm)"
    ].values

Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr, old, and new, which refer to the index of the data point that is collected. I have also written a small deselect_ant() callback function which removes selection from the summary plot when we change the beetle in the drop-down list.

[9]:
def select_ant(attr, old, new):
    """Update widgets for selection on strip plot."""
    # Extract data source
    source = strip.renderers[0].data_source

    # Use try block in case no data are selected (then pass)
    try:
        # Get index of selected glyph
        ind = new[0]

        # Set widget values
        beetle_selector.value = source.data["beetle_treatment"][ind]
        ant_ID_selector.value = str(source.data["ID"][ind])
    except:
        pass

Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change() method of a ColumnDataSource.

[10]:
strip = strip_plot(df_dist)
strip.renderers[0].data_source.selected.on_change("indices", select_ant)

# writing on_change for all other widgets
beetle_selector.on_change("value", update_ant_ID_selector)
bodypart_selector.on_change("value", callback, update_strip)
ant_ID_selector.on_change("value", callback)
time_interval_slider_throttled.on_change("value_throttled", callback)

All the pieces are now in place! Let’s lay it out!

[11]:
widgets = bokeh.layouts.column(
    time_interval_slider_throttled,
    bokeh.models.Spacer(height=20),
    beetle_selector,
    bokeh.models.Spacer(height=20),
    bokeh.layouts.row(
        ant_ID_selector,
        bokeh.models.Spacer(width=15),
        bodypart_selector,
    ),
)

row1 = bokeh.layouts.row(
    traj_plot(cds),
    bokeh.models.Spacer(width=20),
    plot_distance_traveled(cds),
    bokeh.models.Spacer(width=20),
    widgets
)
row2 = bokeh.layouts.row(plot_xy(cds), bokeh.models.Spacer(width=30), strip)

dashboard = bokeh.layouts.column(row1, bokeh.models.Spacer(height=20), row2)


def app(doc):
    doc.add_root(dashboard)

Before we plot our awesome dashboard, let’s talk about one more cool feature of Bokeh.

Deploying a dashboard on a stand-alone browser tab

Bokeh lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you have to do is to use bokeh.io.curdoc().add_root() command instead of usual bokeh.io.show(app). You can then serve the dashboard by entering bokeh serve --show name_of_notebook.ipynb on the command line (you have to be in the folder which contains the notebook to do that, or you can use the path to the notebook if you want to serve it from a different place on the machine). Try that yourself; download this notebook, in the first cell set serve = True.

bokeh serve --show name_of_notebook.ipynb
[13]:
if serve:
    bokeh.io.curdoc().add_root(dashboard)
else:
    bokeh.io.show(app, notebook_url=notebook_url)

Computing environment

[13]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,iqplot,jupyterlab
Python implementation: CPython
Python version       : 3.11.4
IPython version      : 8.12.0

numpy     : 1.24.3
scipy     : 1.10.1
pandas    : 1.5.3
bokeh     : 3.2.1
iqplot    : 0.3.3
jupyterlab: 4.0.4