AAG April 07, 2017: Boston, MA
Abstract
The embodiment of surveillant technologies provides a means and site of production of data to be consumed by both the self and the medical gaze with physical and emotional consequences. Analyzing discourse on the subreddit r/diabetes, I examine assemblages of surveillant technologies that render an ever-increasing quantified self for those using insulin pumps and glucose monitors. Haraway (1990) brought cyborgs to the fore in the early 1990s and Lupton & Seymour (2000) describes cyborgs ``When hooked up to medical (and other) technologies, the patient’s body becomes a cyborg, a juncture of human flesh and machine’’ (p.56). Bodies are rendered regulatable through the use of embodied and disembodied technologies. People with any type of diabetes may be treated with insulin, which can be self-administered through multiple daily injections or through an insulin pump. Insulin pumps, as well as insulin, are proprietary. There is a growing do-it-yourself movement when it comes to hacking the cyborg self. Open source communities have made headway in generating new technology, reappropriating old devices or implementing everyday hacks of hardware and its fleshy interface (Forlano, 2016). Lupton (2016) describes these devices and data as intermingling within a data economy, which I argue in the case of diabetes are used in surveillance and the medical gaze. Devices used to manage diabetes quantify the self and datify the device user. These data as perceived by medical practitioners, family members, friends, strangers and last, but not least, the self can have significant effects on everyday life, socio-spatial relations and emotional health.
Methods and Methodologies: Mixed Methods & Reddit Data Using R
I have utilized netnography (Kozinets, 2015) to explore the online community called r/diabetes including observation (lurking) with screenshots, key word searches through the reddit GUI (graphical user interface), and am using R to analyze user-generated, textual data and its metadata. I chose to use reddit data due to its availability. Additionally I chose Reddit because their TOS provides no expectation of prvacy to the users. While this is a much larger project, for today’s presentation, I will focus on using R to explore a large unstructured (textual) data set.
What is R? Geographers such as Kitchin & Dodge (2011) discuss the importance and pervasiveness of code and software in our everyday lives. R is a computer programming language developed out of S at Bell Labs in the 70s and 80s for statistical computing (R Core Team, 2013). It is an opensource language and under a GNU license. It has a large, active contributing community, which is one of its largest assets. A one the most prolific contributors is Hadley Wickham, who developed all of the libraries I use in this project (Wickham & Francois, 2015).
Geographers like Crooks & Chouinard (2006) have used the proprietary software, NVivo, for analyzing text, while geographers like Elwood & Cope (2009), Aitken & Kwan (2010), Elwood (2010), and Wilson (2015) have taken traditionally quantitative tools and reappropriated them for qualitative, critical and mixed-method research in GIS. Although there is some interest in using R for mapping and spatial analysis (Brunsdon & Comber, 2015), there has been little work done using or writing about reappropriating traditionally quantitative opensource tools like R for analysis of qualitative data.
This type of reappropriation I suggest could be considered an epistemological or methodological challenge that “disrupts efforts to constrain empistemological diversity”" (Elwood, 2010, p. 106). This research therefore uses both inductive and deductive approaches, which Elwood (2010) asserts “challenge[s] the proposition that epistemologies are necessarily separate and singular” (p. 107).
Using R to analayze social media data is not a sort of panacea for looking at larger amounts of data or somehow claiming this way as new and therefore superior, but instead is what DeLyser & Sui (2014) call, ``an embrace of engaged methodological pluralism, where different and divergent methods flourish to tackle issues from different angles’’ (p. 303).
library(dplyr) # dplyr is a library written by Hadley Wickham for easy manipulation of data. See https://cran.r-project.org/web/packages/dplyr/index.html to download or http://dplyr.tidyverse.org/
library(readr) # Hadley Wickham made readr to make it easier for reading many kinds of tabular data.
library(ggplot2) # This is a plotting system for R, also by Hadley Wickham. See http://ggplot2.org/.
d <- read_csv("~/Downloads/reddit_diabetes.csv") # This data set was accessed through the reddit API and is publicly available. #a data frame is a data structure like a spreadsheet that uses code instead of a graphical user interface (although it could be argued that this is a graphical user interface.
The following functions are built into Dplyr.
mutate()
Adds new variables that are functions of exisiting variables. select()
Picks variables based on their names. filter()
Picks cases based on their values. summarise()
Reduces multiple values down to a single summary. arrange()
Changes the ordering of the rows.
You can also make your own functions. Below I create the filter_by_anything
function for key word searches.
filter_by_anything <- function(.data, pattern) { # Pattern here just means an input to be matched.
.data %>%
filter(grepl(pattern, body)) # grepl returns a logical vector (match or not for each element of x).
}
How do data producers access their own data from meters, pumps and continuous glucose meters (CGMs)?
d %>%
filter_by_anything("data") %>%
as.data.frame()
# One of the comments this yielded was interesting for answering our question:
# "There is so much going on in this field right now. In the past, people were pretty limited in what they could do with technology to manage their diabetes,because the type of technology wasn't considered in the FDA approval process (i.e. it had to jump through all the same hoops that a new pump model would, even if it was just an app that was reading data, not writing it). Even at that time, people were still coming up with ideas. In particular, I can think of a user here who set up an IFTTT program that would let him text his sugar to a number and then it would be automatically logged in a Google spreadsheet.He shared the code here and I was able to set it up for myself, too. I probably still have the code snippet somewhere.\n\nNow there has been a proliferation of diabetes related apps. There are some that gamify your care. An example is #[MySugr](https://mysugr.com/) where you make very detailed log entries to get more points to "feed the monster" and if you hit a certain number ... <truncated>"
Or I might try to look at other pertinent key words that would contain discussions of how commenters access their data. Here I decided to use an inclusive or to gather posts containing any or all of the arguments.
d %>%
filter(grepl(" API ", body, ignore.case=TRUE) | # Means "or"" as in boolean logic (inclusive or)
# Space definitely matters here as I has to include them before and after API so it wouldn't include posts with a word that contained the letters api in them.
grepl("Wireless", body, ignore.case=TRUE) |
grepl(" hack ", body, ignore.case=TRUE) |
grepl("blue tooth", body, ignore.case=TRUE) |
grepl("bluetooth", body, ignore.case=TRUE) |
grepl("reverse engineer", body, ignore.case=TRUE)) %>%
sample_n(35) %>%
as.data.frame()
Another pertinent question: What sorts of problems occur with the fleshy side of the interface?
d %>%
filter(grepl("infection", body, ignore.case=TRUE) &
grepl("site", body, ignore.case=TRUE)) %>%
sample_n(35) %>%
as.data.frame()
NA
The comments reveal that the fleshy interface must be rotated every several days, if not, a site can become infected and/or cause poor insulin infusion resulting in high blood glucose.
There are so many potential ways to look at unstructured data using R and dplyr, it really is up to the imagination. As you can see it yields some pretty rich data, which ultimately is filtered through the researcher. There are many limitations to this mixing up of methods.
The larger reddit data corpus is publicly available at https://archive.org/details/2015_reddit_comments_corpus. To make this data set and code available and accessible, I am hosting it in an S3 bucket through Amazon Web Services (AWS). It is also important, from a feminist methodological perspective, to make my research available to the community with whose data I worked and will post my github repository as well as my r notebook https://s3.amazonaws.com/gentryhanks-analysis/RedditAnalysis.nb.html to the subreddit once completed. Using r notebook also makes my research readily available through auto generation of html for immediate web hosting.
Insulin pump and continuous glucose monitor users generate large amounts of data, but often have trouble accessing or making sense of that data, not to mention the embodied toll this data collection takes, often causing infection or damage to the fleshy side of the interface.
Discussion Questions
What about trustworthiness of internet data (Hartig, 2008)? There is little incentive to lie in this subreddit. Ethics of netnography?
Theoretical hurdles?
Bibliography
Aitken, S. C., & Kwan, M.-P. (2010). GIS as qualitative research: Knowledge, participatory politics and cartographies of affect. The SAGE Handbook of Qualitative Geography, 287–304.
Brunsdon, C., & Comber, L. (2015). An introduction to r for spatial analysis and mapping. Sage.
Crooks, V. A., & Chouinard, V. (2006). An embodied geography of disablement: Chronically ill women’s struggles for enabling places in spaces of health care and daily life. Health & Place, 12(3), 345–352.
DeLyser, D., & Sui, D. (2014). Crossing the qualitative-quantitative chasm iii: Enduring methods, open geography, participatory research, and the fourth paradigm. Progress in Human Geography, 38(2), 294–307.
Elwood, S. (2010). Mixed methods: Thinking, doing, and asking in multiple ways. In The SAGE handbook of qualitative geography (pp. 94–114). Los Angeles; London: SAGE.
Elwood, S., & Cope, M. S. (Eds.). (2009). Qualitative gis: A mixed methods approach. Sage Publications.
Forlano, L. (2016). Hacking the feminist body. Journal of Peer Production, 8.
Haraway, D. J. (1990). Simians, cyborgs, and women: The reinvention of nature. New York: Routledge.
Hartig, O. (2008). Trustworthiness of data on the web. In Proceedings of the sti berlin & csw phd workshop. Citeseer.
Kitchin, R., & Dodge, M. (2011). Code/space: Software and everyday life. MIT Press.
Kozinets, R. V. (2015). Netnography: Redefined. Sage.
Lupton, D. (2016). Foreword: Lively devices, lively data and lively leisure studies. Leisure Studies, 1–3.
Lupton, D., & Seymour, W. (2000). Technology, selfhood and physical disability. Social Science and Medicine Social Science and Medicine, 50(12), 1851–1862.
R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Wickham, H., & Francois, R. (2015). Dplyr: A grammar of data manipulation. R Package Version 0.4, 1, 20.
Wilson, M. W. (2015). Paying attention, digital media, and community-based critical gis. Cultural Geographies, 22(1), 177–191.
---
title: "Diabetes and the Quantified Self: Hacking Hard, Fleshy Interfaces"

output: html_notebook
bibliography: dissertation_bibliography.bib
csl: apa.csl
---
### Gentry Hanks, Queen's University, Twitter: \@gentryhanks
### AAG April 07, 2017: Boston, MA
# Abstract

The embodiment of surveillant technologies provides a means and site of production of data to be consumed by both the self and the medical gaze with physical and emotional consequences. Analyzing discourse on the subreddit r/diabetes, I examine assemblages of surveillant technologies that render an ever-increasing quantified self for those using insulin pumps and glucose monitors. @haraway_simians_1990 brought cyborgs to the fore in the early 1990s and @lupton_technology_2000 describes cyborgs ``When hooked up to medical (and other) technologies, the patient's body becomes a cyborg, a juncture of human flesh and machine'' (p.56). Bodies are rendered regulatable through the use of embodied and disembodied technologies. People with any type of diabetes may be treated with insulin, which can be self-administered through multiple daily injections or through an insulin pump. Insulin pumps, as well as insulin, are proprietary. There is a growing do-it-yourself movement when it comes to hacking the cyborg self. Open source communities have made headway in generating new technology, reappropriating old devices or implementing everyday hacks of hardware and its fleshy interface [@forlano_hacking_2016]. @lupton2016foreword describes these devices and data as intermingling within a data economy, which I argue in the case of diabetes are used in surveillance and the medical gaze. Devices used to manage diabetes quantify the self and datify the device user. These data as perceived by medical practitioners, family members, friends, strangers and last, but not least, the self can have significant effects on everyday life, socio-spatial relations and emotional health.

# Methods and Methodologies: Mixed Methods & Reddit Data Using R
I have utilized netnography [@robert2015netnography] to explore the online community called r/diabetes including observation (lurking) with screenshots, key word searches through the reddit GUI (graphical user interface), and am using R to analyze user-generated, textual data and its metadata. I chose to use reddit data due to its availability. Additionally I chose Reddit because their TOS provides no expectation of prvacy to the users. While this is a much larger project, for today's presentation, I will focus on using R to explore a large unstructured (textual) data set.

What is R?
Geographers such as @kitchin_code_2011 discuss the importance and pervasiveness of code and software in our everyday lives. R is a computer programming language developed out of S at Bell Labs in the 70s and 80s for statistical computing [@R_2013]. It is an opensource language and under a GNU license. It has a large, active contributing community, which is one of its largest assets. A one the most prolific contributors is Hadley Wickham, who developed all of the libraries I use in this project [@wickham2015dplyr].

Geographers like @crooks2006embodied have used the proprietary software, NVivo, for analyzing text, while geographers like @elwood2009qualitative, @aitken2010gis, @elwood_mixed_2010, and @wilson2015paying have taken traditionally quantitative tools and reappropriated them for qualitative, critical and mixed-method research in GIS. Although there is some interest in using R for mapping and spatial analysis [@brunsdon2015introduction], there has been little work done using or writing about reappropriating traditionally quantitative opensource tools like R for analysis of qualitative data.

This type of reappropriation I suggest could be considered an epistemological or methodological challenge that "disrupts efforts to constrain empistemological diversity"" [@elwood_mixed_2010, p. 106]. This research therefore uses both inductive and deductive approaches, which @elwood_mixed_2010 asserts "challenge[s] the proposition that epistemologies are necessarily separate and singular" (p. 107). 

Using R to analayze social media data is not a sort of panacea for looking at larger amounts of data or somehow claiming this way as new and therefore superior, but instead is what @delyser2014crossing call, ``an embrace of engaged methodological pluralism, where different and divergent methods flourish to tackle issues from
different angles'' (p. 303).


```{r, message=FALSE}
library(dplyr) # dplyr is a library written by Hadley Wickham for easy manipulation of data. See https://cran.r-project.org/web/packages/dplyr/index.html to download or http://dplyr.tidyverse.org/
library(readr) # Hadley Wickham made readr to make it easier for reading many kinds of tabular data.
library(ggplot2) # This is a plotting system for R, also by Hadley Wickham. See http://ggplot2.org/.
```
```{r, message=FALSE}

d <- read_csv("~/Downloads/reddit_diabetes.csv") # This data set was accessed through the reddit API and is publicly available. #a data frame is a data structure like a spreadsheet that uses code instead of a graphical user interface (although it could be argued that this is a graphical user interface.
```



The following functions are built into Dplyr.

`mutate()`    Adds new variables that are functions of exisiting variables.
`select()`    Picks variables based on their names.
`filter()`    Picks cases based on their values.
`summarise()` Reduces multiple values down to a single summary.
`arrange()`   Changes the ordering of the rows.

You can also make your own functions.
Below I create the `filter_by_anything` function for key word searches.
```{r}
filter_by_anything <- function(.data, pattern) { # Pattern here just means an input to be matched.
  .data %>%
    filter(grepl(pattern, body)) # grepl returns a logical vector (match or not for each element of x).
}
```

### How do data producers access their own data from meters, pumps and continuous glucose meters (CGMs)?
```{r}
d %>%
  filter_by_anything("data") %>%
  as.data.frame()
# One of the comments this yielded was interesting for answering our question:
# "There is so much going on in this field right now. In the past, people were pretty limited in what they could do with technology to manage their diabetes,because the type of technology wasn't considered in the FDA approval process (i.e. it had to jump through all the same hoops that a new pump model would, even if it was just an app that was reading data, not writing it). Even at that time, people were still coming up with ideas. In particular, I can think of a user here who set up an IFTTT program that would let him text his sugar to a number and then it would be automatically logged in a Google spreadsheet.He shared the code here and I was able to set it up for myself, too. I probably still have the code snippet somewhere.\n\nNow there has been a proliferation of diabetes related apps. There are some that gamify your care. An example is #[MySugr](https://mysugr.com/) where you make very detailed log entries to get more points to "feed the monster" and if you hit a certain number ... <truncated>"

```

Or I might try to look at other pertinent key words that would contain discussions of how commenters access their data. Here I decided to use an inclusive or to gather posts containing any or all of the arguments.
```{r}
d %>%
  filter(grepl(" API ", body, ignore.case=TRUE) | # Means "or"" as in boolean logic (inclusive or)
           # Space definitely matters here as I has to include them before and after API so it wouldn't include posts             with a word that contained the letters api in them.
           grepl("Wireless", body, ignore.case=TRUE) |
           grepl(" hack ", body, ignore.case=TRUE) |
           grepl("blue tooth", body, ignore.case=TRUE) |
           grepl("bluetooth", body, ignore.case=TRUE) |
           grepl("reverse engineer", body, ignore.case=TRUE)) %>%
  sample_n(35) %>%
  as.data.frame()
```

### The next question I'd like to explore is:  How do people in this community feel about wearing an embodied medical device in everyday life?

```{r}
# Here I filter by keyword "cyborg" and use arrange to order them from oldest to newest and finally use as.data.frame() to read the comments from the body.
d %>%
  filter_by_anything("cyborg") %>%
  arrange(created_utc) %>%
  as.data.frame()
```

# Another pertinent question: What sorts of problems occur with the fleshy side of the interface?

```{r}
d %>%
 filter(grepl("infection", body, ignore.case=TRUE) &
        grepl("site", body, ignore.case=TRUE)) %>%
  sample_n(35) %>%
  as.data.frame()
  
```

The comments reveal that the fleshy interface must be rotated every several days, if not, a site can become infected and/or cause poor insulin infusion resulting in high blood glucose. 

There are so many potential ways to look at unstructured data using R and dplyr, it really is up to the imagination. As you can see it yields some pretty rich data, which ultimately is filtered through the researcher. There are many limitations to this mixing up of methods. 


The larger reddit data corpus is publicly available at https://archive.org/details/2015_reddit_comments_corpus. To make this data set and code available and accessible, I am hosting it in an S3 bucket through Amazon Web Services (AWS). It is also important, from a feminist methodological perspective, to make my research available to the community with whose data I worked and will post my github repository as well as my r notebook https://s3.amazonaws.com/gentryhanks-analysis/RedditAnalysis.nb.html to the subreddit once completed. Using r notebook also makes my research readily available through auto generation of html for immediate web hosting.  

Insulin pump and continuous glucose monitor users generate large amounts of data, but often have trouble accessing or making sense of that data, not to mention the embodied toll this data collection takes, often causing infection or damage to the fleshy side of the interface.


# Discussion Questions
What about trustworthiness of internet data [@hartig2008trustworthiness]? There is little incentive to lie in this subreddit. 
Ethics of netnography?

Theoretical hurdles?

# I'd like to thank Christopher Peters, the users of r/diabetes, Queen's University, and the organizers of this session.




http://www.gentryhanks.com

Visit my R notebook and GitHub repo on r/diabetes reddit data! 
R notebook: http://bit.ly/2nI79VL  
GitHub: http://bit.ly/2ngWlSR 




# Bibliography
