Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

load("brfss2013.RData")

Part 1: Data

Introduction

The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.

Data Collection and Generalizability

With technical and methodological assistance from CDC, state health departments use in-house interviewers or contract with telephone call centers or universities to administer the BRFSS surveys continuously through the year. The survey is conducted using Random Digit Dialing (RDD) techniques on both landlines and cell phones.

Adults 18 years or older are asked to take part in the survey. The number of interviews within each state will vary based on funding and the size of regions, such as health districts, within each state.

Causality

Making causal conclusions based on observational data is not appropriate. Therefore, observational studies are only sufficient to show associations.


Part 2: Research questions

Research quesion 1: Is there an association between general health (genhlth variable) and the frequency of feeling nervous in the past 30 days (misnervs variable) ?

It would be interesting to know if the psychological condition is linked in any way with the general physical health condition of a person.

Research quesion 2: Is there an association between whether taking exercise in the past 30 days (exerany2 variable) and BMI (X_bmi5 variable) for males and females (sex variable) ?

It provides us a better way to understand the association between routinely exercise and BMI of a person.

Research quesion 3: Is there an association between BMI (X_bmi5 variable) and the computed vegetable intake in times per day (vegeda1_ variable) for males and females (sex variable) ?

An actual association between BMI and vegetable consumption might help us with means to better design a daily diet to control body weight.


Part 3: Exploratory data analysis

Research quesion 1:

brfss2013_nerv_genhlth <- brfss2013 %>% filter(misnervs != "NA") %>% mutate(gen_hlth = ifelse(genhlth == "Poor", "Poor", "Good")) %>% filter(gen_hlth != "NA")

ggplot(brfss2013_nerv_genhlth, aes(x = misnervs, fill = gen_hlth)) + geom_bar(position = "fill") + scale_fill_discrete(name = "General Health") + xlab("Frequency of nervous days") + ylab("Proportion")

brfss2013_nerv_genhlth %>% group_by(misnervs) %>% summarise(prop_poor_hlth = sum(gen_hlth == "Poor") / n())
## # A tibble: 5 × 2
##   misnervs prop_poor_hlth
##     <fctr>          <dbl>
## 1      All     0.30637255
## 2     Most     0.19014085
## 3     Some     0.08803955
## 4 A little     0.03477207
## 5     None     0.03039751

From the statistics above (prop_poor_hlth : proportion of people in poor health), we may clearly see that the proportion of people feeling in poor general health increases with the number of nervous days in the previous month.

Research quesion 2:

brfss2013_exer_bmi <- brfss2013 %>% filter(exerany2 != "NA") %>% filter(X_bmi5 != "NA") %>% mutate(bmi = X_bmi5 / 100) %>% filter(sex != "NA")

ggplot(brfss2013_exer_bmi, aes(x = exerany2, y = bmi)) + geom_boxplot() + facet_wrap(~sex) + xlab("Exercise condition") + ylab("BMI")

brfss2013_exer_bmi %>% group_by(sex, exerany2) %>% summarise(median_bmi = median(bmi), iqr_bmi = IQR(bmi))
## Source: local data frame [4 x 4]
## Groups: sex [?]
## 
##      sex exerany2 median_bmi iqr_bmi
##   <fctr>   <fctr>      <dbl>   <dbl>
## 1   Male      Yes      27.12    5.95
## 2   Male       No      28.13    7.15
## 3 Female      Yes      25.79    7.28
## 4 Female       No      28.25    9.20

The statistics above (median_bmi : median of BMI, iqr_bmi : interquartile range of BMI) show that the median and interquartile range of the BMI are slightly higher when people take no exercise in past 30 days for both males and females. However, we cannot rule out the possibility that these results happen by chance only.

Research quesion 3:

brfss2013_vege_bmi <- brfss2013 %>% filter(vegeda1_ != "NA") %>% filter(X_bmi5 != "NA") %>% mutate(bmi = X_bmi5 / 100) %>% filter(sex != "NA")

ggplot(brfss2013_vege_bmi, aes(x = bmi, y = log(vegeda1_+1), colour = sex)) + geom_point(shape = 19, alpha = 1/4) + geom_smooth(method = lm, se = FALSE) + scale_colour_discrete(name = "Sex") + xlab("BMI") + ylab("Computed Vegetable Intake In Times Per Day (Log)")

brfss2013_vege_bmi %>% group_by(sex) %>% summarise(corr_bmi_vege_activity = cor(bmi, vegeda1_))
## # A tibble: 2 × 2
##      sex corr_bmi_vege_activity
##   <fctr>                  <dbl>
## 1   Male           -0.008739697
## 2 Female           -0.037264968

The statistics above (corr_bmi_vege_activity : correlation of BMI and computed vegetable intake in times per day) show a trend of having less vegetables in times per day when BMI is higher for both males and females. Though, here again we cannot tell if this results from a real trend in the data or is merely happening by chance only.