For this first project we had to come up with a good idea on how to build a physical visualization of
data coming from the US Environmental Protection Agency, reporting mainly information about air quality
and pollutant concentration over time and across the United States.
Brainstorming

When I started thinking about what kind of visualization I could build out of the data,
I came out with several different ideas: after some initial experiment I had to drop a couple
of them because I soon realized they were not working as I expected. I was then left with another
pair of choices, but neither of them was really satisfying me. The reason I was not happy with those
visualization concepts was that they were really close to being just a physical 3D representation
of a visualization I would have done using a computer. What I really wanted to do was to try and represent
the data using
a completely different approach. The problem was that, given the type of data, every
exotic visualization that came into my mind was worse than its traditional counterparts, or it was just
completely silly... This somewhat reminded me of
Chindogu...
Chindogu
Chindogu (literally: unusual tool) is the Japanese art of creating tools with the purpose of solving
everyday problems: the drawback is that these tools are so awkward (or embarassing...) to use that it
becomes pointless to use them to solve the problem they were intended for... Well... why not try this
on a visualization and see what actually comes out?
Simple Guide to Building an Awkward Visualization
My objective for this project became to build a visualization that:
- Used a completely different way of representing data: something that just could not be put on a display.
- Was formally correct (i.e. it actually contained the data I wanted to visualize)
- Was intuitive to use (after a brief explanation or through a legend)
Now, that is an almost complete definition of a good visualization. The remaining point is that
a visualization should be efficient at communicating its
content (that is, the data it represents)
to the user. I decided to invert
just this requirement: in other words I wanted to build a
visualization that, although formally correct and intuitive to use, was representing the data in
a completely awkward way..
Choosing Data
I decided to represent data for two pollutants (carbon monoxide and particulate matter)
over a period of seven years, and for seven different sites in Illinois. The sites were choosen
as to have almost complete data for both pollutants in the considered time period (2001 - 2007).
Also, I chose to represent national trends for the two pollutants in the same period.
I took the data from time graphs and put it on a spreadsheet to be able to easily consult it later on.
The total number of datapoints is 2(pollutants) x 7(years) x (7 + 1)(sites + national trend) = 112 datapoints.
Ingredients

This is the list of things I used to build my Chindogu visualization:
- Hard box paper.
- Some wood boards left after assembling a piece of Ikea furniture...
- Styrofoam.
- Colored thumbtacks.
- 2 medicine dispensers.
- Cotton Swabs.
- Lots and lots of pennies.

Having planned the project construction steps on paper, building it was pretty straightforward.
The basic visualization skeleton consists of a map of the chosen area (Illinois) with an indication of
each considered site. Each site is linked with a set of small, closed boxes. Each one of the boxes is tagged
with a year. Thus, opening a box unveils data for a specific site, at a specific year. Of course, more boxes
can be opened at the same time to 'show' more data. The entire process is extremely intuitive. One can
open a row of boxes, to show time varying data for the same site, open a column to show data for a specific
year in different sites, or open an arbitrary region. Only data the user is specifically interested in gets
shown.
Adding Data

After the basic visualization structure was built, It was time to actually add data to it
(that is, fill the boxes). Each box had to contain data about two pollutants.
This data was represented through... pennies and cotton swab sticks. In particular:
- The number of pennies was used to represent the level of Carbon Monoxide.
- The length of the cotton swab stick was used to represent the level of Particulate Matter.
Data was discretized using eight different levels per pollutant: this means that each box can contain
one to eight pennies, and a stick of eight different lengths. Due to the heavy disctretization I operated,
I decided to completely avoid putting a legend to map the number of pennies / length of sticks to actual,
physical value intervals. Actually, if you are not an expert, even knowing the exact value of concentration of a pollutant brings little information.
Is a ppm Carbon Monoxide concentration of 6 a big value or not? Due to this reason, the
data contained in a single box has to value by itself: it has to be compared with data coming from
other boxes, representing different sites, years, or the national trend.
Why this is NOT a Good Visualization
First of all, It is worth noting how this way of representing the chosen dataset respects the criteria
of being formally correct and intuitive:
- It correctly represents all the data: even if data in presented in an approximate, discretized form
it is still possible to compare different datapoints to have a qualitative feeling of how pollutant
concentrations change over time and in different places.
- Accessing and interpreting information from a specific datapoint is straightforward.
That said, this representation of data still fails at effectively communicating information to the user.
What are its main drawbacks?
- First of all, the representation of a single datapoint, even if easy to explain, is complex to
process. To understand the content of a datapoint I would have to count the number of pennies
and / or interpret the length of the stick. Lots of times this will require actually taking these objects
out of their box and handle them together with the contents of another box to understand their meaning. This
also mean that observing time varying data from the same site, or comparing more than two datapoints becomes
extremely complex.
- Another wrong feature in the visualization is the presence of data filtering (i.e. opening
and closing the boxes). Filtering is in general useful when a dataset is too big to be effectively
perceived all at the same time. This is not the case of the current dataset, since it would be possible
to completely display it without confusing the user. In this case, filtering becomes an
unnecessary burden. It is significant to underline how filtering here is not bad because
it is not intuitive: it is actually pretty easy to understand how it works.
It is bad because it adds complexity to the interaction without bringing any significant
advantage in communicating the meaning of data.
Conclusion

I realize what I did on this project is pretty different from the actual requirements. We were supposed
to do a physical visualization right, while I did it explicitly wrong in part. Nonetheless, I found extremely
interesting to see what are the effects of breaking one 'guideline' in the design of a visualization, while
respecting the others. Communicating information to the user is probably the single most important aspect
of visualization. This is the reason why failing this point turned my novel data representation approach
into a Chindogu :) Breaking a different rule, while still negative, may have a less
'destructive' impact on the final result. For instance, designing a less intuitive representation, may bring
initial confusion to the user, or could require a longer training. But once the user understands how the
visualization 'works', he can still get useful information from it, suffering a lighter penalty in terms
of efficiency.