Selecting data and serving a dashboard
This recitation was created by Nastya Oguienko and Justin Bois based on the lessons on dashboarding from previous versions of this course.
[1]:
import os
data_path = "../data/"
import numpy as np
import pandas as pd
import scipy.stats as st
import bokeh.io
import bokeh.plotting
import bokeh.models
import iqplot
bokeh.io.output_notebook()
serve = False
notebook_url = "localhost:8888"
Important! Interactive control of graphics does not work in Google Colab. You have to run these notebooks in Jupyter Lab on your local machine!
Also, dashboards will not appear in the HTML-rendered version of this notebook. You are therefore encouraged to download and run this notebook on your local machine.
A plot of summary data
We will use the same beetle data set. Let’s load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.
[2]:
# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")
interpixel_distance = 0.08 # cm
# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance
# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28
def distance_traveled(df):
"""Compute distance traveled for a given beetle."""
x_diff = df["x (cm)"].diff()
y_diff = df["y (cm)"].diff()
return np.cumsum(np.sqrt(x_diff**2 + y_diff**2))
df["distance traveled (cm)"] = (
df.groupby(["ID", "bodypart"])
.apply(distance_traveled)
.reset_index(level=["ID", "bodypart"], drop=True)
)
# Take a look to remind ourselves
df.head()
[2]:
frame | beetle_treatment | ID | bodypart | x_coord | y_coord | likelihood | x (cm) | y (cm) | time (sec) | distance traveled (cm) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | dalotia | 0 | head | 73.086 | 193.835 | 1.0 | 5.84688 | 15.50680 | 0.000000 | NaN |
1 | 1 | dalotia | 0 | head | 73.730 | 194.385 | 1.0 | 5.89840 | 15.55080 | 0.035714 | 0.067752 |
2 | 2 | dalotia | 0 | head | 75.673 | 195.182 | 1.0 | 6.05384 | 15.61456 | 0.071429 | 0.235761 |
3 | 3 | dalotia | 0 | head | 77.319 | 196.582 | 1.0 | 6.18552 | 15.72656 | 0.107143 | 0.408629 |
4 | 4 | dalotia | 0 | head | 78.128 | 197.891 | 1.0 | 6.25024 | 15.83128 | 0.142857 | 0.531735 |
We may want to summarize the motion of the ants by the total distance traveled. Let’s compute that and store the result in a new data frame.
[3]:
df_dist = (
df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
.apply(lambda x: x.max())
.reset_index()
)
# Take a look
df_dist
[3]:
beetle_treatment | ID | bodypart | distance traveled (cm) | |
---|---|---|---|---|
0 | dalotia | 0 | abdomen | 1256.637437 |
1 | dalotia | 0 | antenna_left | 2688.416512 |
2 | dalotia | 0 | antenna_right | 2800.528436 |
3 | dalotia | 0 | head | 1647.465193 |
4 | dalotia | 0 | thorax | 1266.693540 |
5 | dalotia | 1 | abdomen | 1143.534573 |
6 | dalotia | 1 | antenna_left | 2892.714768 |
7 | dalotia | 1 | antenna_right | 2856.248616 |
8 | dalotia | 1 | head | 1575.459175 |
9 | dalotia | 1 | thorax | 1135.457988 |
10 | dalotia | 2 | abdomen | 1068.647668 |
11 | dalotia | 2 | antenna_left | 3427.051189 |
12 | dalotia | 2 | antenna_right | 3810.080622 |
13 | dalotia | 2 | head | 1870.329342 |
14 | dalotia | 2 | thorax | 1289.644406 |
15 | dalotia | 3 | abdomen | 2169.129372 |
16 | dalotia | 3 | antenna_left | 4687.207298 |
17 | dalotia | 3 | antenna_right | 5569.482037 |
18 | dalotia | 3 | head | 3383.078720 |
19 | dalotia | 3 | thorax | 2435.800519 |
20 | dalotia | 4 | abdomen | 1879.510454 |
21 | dalotia | 4 | antenna_left | 3449.479980 |
22 | dalotia | 4 | antenna_right | 3330.416362 |
23 | dalotia | 4 | head | 2059.613435 |
24 | dalotia | 4 | thorax | 1642.574170 |
25 | dalotia | 5 | abdomen | 1383.706414 |
26 | dalotia | 5 | antenna_left | 2677.861333 |
27 | dalotia | 5 | antenna_right | 2561.370168 |
28 | dalotia | 5 | head | 1735.869976 |
29 | dalotia | 5 | thorax | 1326.177297 |
30 | sceptobius | 6 | abdomen | 912.727949 |
31 | sceptobius | 6 | antenna_left | 2657.582883 |
32 | sceptobius | 6 | antenna_right | 2287.451179 |
33 | sceptobius | 6 | head | 1205.616500 |
34 | sceptobius | 6 | thorax | 588.067617 |
35 | sceptobius | 7 | abdomen | 339.701993 |
36 | sceptobius | 7 | antenna_left | 1531.338615 |
37 | sceptobius | 7 | antenna_right | 2389.643450 |
38 | sceptobius | 7 | head | 420.652691 |
39 | sceptobius | 7 | thorax | 238.159884 |
40 | sceptobius | 8 | abdomen | 500.156206 |
41 | sceptobius | 8 | antenna_left | 2853.945585 |
42 | sceptobius | 8 | antenna_right | 2777.918093 |
43 | sceptobius | 8 | head | 1085.719023 |
44 | sceptobius | 8 | thorax | 703.824390 |
45 | sceptobius | 9 | abdomen | 357.735190 |
46 | sceptobius | 9 | antenna_left | 2382.851423 |
47 | sceptobius | 9 | antenna_right | 2488.580833 |
48 | sceptobius | 9 | head | 887.401463 |
49 | sceptobius | 9 | thorax | 546.723268 |
50 | sceptobius | 10 | abdomen | 661.166480 |
51 | sceptobius | 10 | antenna_left | 2693.747130 |
52 | sceptobius | 10 | antenna_right | 2614.627036 |
53 | sceptobius | 10 | head | 1181.083980 |
54 | sceptobius | 10 | thorax | 826.188143 |
55 | sceptobius | 11 | abdomen | 504.217197 |
56 | sceptobius | 11 | antenna_left | 2003.843440 |
57 | sceptobius | 11 | antenna_right | 2048.341855 |
58 | sceptobius | 11 | head | 726.746776 |
59 | sceptobius | 11 | thorax | 514.148107 |
To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.
[4]:
def strip_plot(df_dist):
strip = iqplot.strip(
df_dist.loc[df_dist["bodypart"] == "thorax", :],
q="distance traveled (cm)",
cats="beetle_treatment",
q_axis="x",
palette=["#7570b3", "#1b9e77"],
frame_height=100,
frame_width=350,
tools="pan,box_zoom,wheel_zoom,reset,tap,save",
tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
toolbar_location="above",
)
# Always start at zero
strip.x_range.start = 0
return strip
Let’s see if it works….
[5]:
bokeh.io.show(strip_plot(df_dist))
This summary plot exposes, for example, that ant 3 is highly active (you can see it’s ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.
Adding more interactions and more plots to the dashboard
Let’s add everything we planned to our dashboard. This includes more options to select data for visualization and more plots: a plot of the x and y position vs time (plotting x and y as a separate path), plot of distance traveled and a summary plot.
We have to think about widgets we want to add. We know our data includes multiple ants, multiple body parts, and multiple beetle treatments. Rather than making a new plot for each possible combination, we can add multiple interactive elements to our dashboard. We will add drop-down lists to choose the beetle treatment, ant ID, and body part to track.
Notice that the possible ant IDs change between beetle treatments. For the Dalotia beetle, we have ant IDs 0–5. For Sceptobius, we have 6–11. So, our Ant ID drop-down list must change when we change the beetle treatment. To do this, we add the helper function update_ant_ID_selector
which updates the options in the ant_ID_selector
when the beetle treatment is changed. Notice that this is a callback function which ensures that the ant ID values it returns depend on the beetle_selector
drop-down list.
We will use the same principles for the code as in the previous part of this lesson. First, we will define functions that will create ColumnDataSource
object for the plots. Next, we will define plotting functions with the cds
argument that will return separate plot objects. We will create multiple widgets: time slider, drop-down lists for the beetle, ant ID and bodypart. We will write callback functions and on_change()
methods for the sliders to link those elements together and to
the ColumnDataSource
object. Again, notice, that this time we have additional callback function to link the ant_ID_selector
to the beetle_selector
. Get ready for a large code cell!
[6]:
# data preparation function
def extract_sub_df(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
"""Extract sub data frame for body part of
one ant over a time range."""
inds = (
(df["ID"] == ant_ID)
& (df["bodypart"] == bodypart)
& (df["time (sec)"] >= time_range[0])
& (df["time (sec)"] <= time_range[1])
)
return df.loc[inds, :]
def build_cds(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
"""Builds a ColumnDataSource object from the part of a given data frame,
based on given ant_ID, bodypart and time range."""
ant_ID = int(ant_ID)
cds = bokeh.models.ColumnDataSource(
extract_sub_df(df, ant_ID, bodypart, time_range)
)
return cds
# plotting functions
def plot_xy(cds):
"""Plot the x and y positions of a beetle over time."""
p_xy = bokeh.plotting.figure(
width=550,
frame_height=100,
x_axis_label="position (cm)",
y_axis_label="time (sec)",
x_range=[0, np.max(df["time (sec)"])],
y_range=[0, 20],
toolbar_location="above"
)
p_xy.add_layout(bokeh.models.Legend(), "right")
p_xy.legend.click_policy="hide"
p_xy.line(
x="time (sec)",
y="x (cm)",
source=cds,
color="#ef8a62",
legend_label="x",
line_width=2,
)
p_xy.line(
x="time (sec)",
y="y (cm)",
source=cds,
color="#67a9cf",
legend_label="y",
line_width=2,
)
return p_xy
def plot_distance_traveled(cds):
"""Make a plot of distance traveled."""
p_dt = bokeh.plotting.figure(
width=300,
height=250,
x_axis_label="time (sec)",
y_axis_label="distance traveled (cm)",
)
p_dt.line(
x="time (sec)",
y="distance traveled (cm)",
source=cds,
color="#ef8a62",
line_width=2,
)
return p_dt
def traj_plot(cds):
"""Make a plot of the trajectory in x-y plane."""
p = bokeh.plotting.figure(
width=350,
height=250,
x_axis_label="x (cm)",
y_axis_label="y (cm)",
x_range=[0, 20],
y_range=[0, 20],
)
color_mapper = bokeh.models.LinearColorMapper(
palette="Viridis256",
low=0,
high=np.max(df["time (sec)"]),
)
color_bar = bokeh.models.ColorBar(
color_mapper=color_mapper, title="time (sec)", width=10
)
p.circle(
x="x (cm)",
y="y (cm)",
source=cds,
size=2,
color={"field": "time (sec)", "transform": color_mapper},
)
p.add_layout(color_bar, "right")
return p
# creating widgets
# Create beetle treatment selector drop-down list
beetle_selector = bokeh.models.Select(
title="beetle treatment",
options=sorted(list(df["beetle_treatment"].unique())),
value="dalotia",
width=300,
)
# Create bodypart selector drop-down list
bodypart_selector = bokeh.models.Select(
title="body part",
options=sorted(list(df["bodypart"].unique())),
value="thorax",
width=120,
)
# Create ant ID selector drop-down list
ant_ID_selector = bokeh.models.Select(
title="Ant ID",
options=sorted(
[
str(i)
for i in df.loc[
df["beetle_treatment"] == df["beetle_treatment"].unique()[0], "ID"
].unique()
]
),
width=120,
value="0",
)
# Create a slider for time
time_interval_slider_throttled = bokeh.models.RangeSlider(
title="time (sec)",
start=df["time (sec)"].min(),
end=df["time (sec)"].max(),
step=1,
value=(df["time (sec)"].min(), df["time (sec)"].max()),
width=300,
value_throttled=(df["time (sec)"].min(), df["time (sec)"].max()),
)
# callback functions
# here is an additional callback function which links beetle_selector to ant_ID_selector
# notice, that it only updates the ant_ID_selector and does not affect cds
def update_ant_ID_selector(attr, old, new):
inds = df["beetle_treatment"] == beetle_selector.value
options = [str(i) for i in sorted(df.loc[inds, "ID"].unique())]
ant_ID_selector.options = options
ant_ID_selector.value = options[0]
def callback(attr, old, new):
# Slider values
time_range = time_interval_slider_throttled.value_throttled
# Body part selector
bodypart = bodypart_selector.value
# Ant ID selector
ant_ID = ant_ID_selector.value
# Renewing data in cds
new_cds = build_cds(df, ant_ID, bodypart, (time_range[0], time_range[1]))
cds.data.update(new_cds.data)
We have defined the functions, let’s now build the initial ColumnDataSource
object:
[7]:
ant_ID = ant_ID_selector.value
bodypart = bodypart_selector.value
beetle = beetle_selector.value
cds = build_cds(df, ant_ID, bodypart)
We have made and connected almost of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.
Our task now is to add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, let’s write a proper callback function (the third callback function in our code) to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p
, its ColumnDataSource
is p.renderers[i].data_source
, where i
is
the index of the set of glyphs we are considering. For strip plots generated by iqplot, there is only a single data source, so i
is always 0
.
[8]:
def update_strip(attr, old, new):
# Update data source
bodypart = bodypart_selector.value
strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
df_dist["bodypart"] == bodypart, "distance traveled (cm)"
].values
Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr
, old
, and new
, which refer to the index of the data point that is collected. I have also written a small deselect_ant()
callback function which removes selection from the summary plot when we change the beetle in the drop-down list.
[9]:
def select_ant(attr, old, new):
"""Update widgets for selection on strip plot."""
# Extract data source
source = strip.renderers[0].data_source
# Use try block in case no data are selected (then pass)
try:
# Get index of selected glyph
ind = new[0]
# Set widget values
beetle_selector.value = source.data["beetle_treatment"][ind]
ant_ID_selector.value = str(source.data["ID"][ind])
except:
pass
Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change()
method of a ColumnDataSource
.
[10]:
strip = strip_plot(df_dist)
strip.renderers[0].data_source.selected.on_change("indices", select_ant)
# writing on_change for all other widgets
beetle_selector.on_change("value", update_ant_ID_selector)
bodypart_selector.on_change("value", callback, update_strip)
ant_ID_selector.on_change("value", callback)
time_interval_slider_throttled.on_change("value_throttled", callback)
All the pieces are now in place! Let’s lay it out!
[11]:
widgets = bokeh.layouts.column(
time_interval_slider_throttled,
bokeh.models.Spacer(height=20),
beetle_selector,
bokeh.models.Spacer(height=20),
bokeh.layouts.row(
ant_ID_selector,
bokeh.models.Spacer(width=15),
bodypart_selector,
),
)
row1 = bokeh.layouts.row(
traj_plot(cds),
bokeh.models.Spacer(width=20),
plot_distance_traveled(cds),
bokeh.models.Spacer(width=20),
widgets
)
row2 = bokeh.layouts.row(plot_xy(cds), bokeh.models.Spacer(width=30), strip)
dashboard = bokeh.layouts.column(row1, bokeh.models.Spacer(height=20), row2)
def app(doc):
doc.add_root(dashboard)
Before we plot our awesome dashboard, let’s talk about one more cool feature of Bokeh.
Deploying a dashboard on a stand-alone browser tab
Bokeh lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you have to do is to use bokeh.io.curdoc().add_root()
command instead of usual bokeh.io.show(app)
. You can then serve the dashboard by entering bokeh serve --show name_of_notebook.ipynb
on the command line (you have to be in the folder which contains the notebook to do that, or you can use the path to the notebook if you
want to serve it from a different place on the machine). Try that yourself; download this notebook, in the first cell set serve = True
.
bokeh serve --show name_of_notebook.ipynb
[13]:
if serve:
bokeh.io.curdoc().add_root(dashboard)
else:
bokeh.io.show(app, notebook_url=notebook_url)
Computing environment
[13]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,iqplot,jupyterlab
Python implementation: CPython
Python version : 3.11.4
IPython version : 8.12.0
numpy : 1.24.3
scipy : 1.10.1
pandas : 1.5.3
bokeh : 3.2.1
iqplot : 0.3.3
jupyterlab: 4.0.4