Setup

Load packages

library(ggplot2)
library(dplyr)
library(grid)

Load data

load("C:/Users/wangjiaxin/Desktop/StatsR/brfss2013.RData")

Part 1: Data

Generalizability According to 2013 BRFSS Overview, landline respondents were randomly selected. There is no mentioning about ramdom sampling for cellular respondents. If random sampling applied to both landline and cellular respondents, survey results should be generalizable to the population of non institutionalized adults over 18 years old residing in the US.

Causality The survey was conducted using phone interview method and there was no random assignment, meaning there is no causality between behavior and health condition in the survey.


Part 2: Research questions

Research quesion 1: What is the relationship between eating fruit and general health status?

I want to know whether people who eat fruits more frequently tend to enjoy better health in general. Variables used for answering this question are fruit1 and genhlth.

Research quesion 2: What is the relationship between sleep time, marital status and mental health?

It is of people’s interest to know whether differences in marital status and mental health in household indicate differences in sleep time. Variables used for answering this question are sleptim1, marital and menthlth.

Research quesion 3: What is the relationship between employment status,alcohol consumption and difficultiy in concentrating or remembering?

I am intersted in knowing whether employment status is related to alcohol consumption and whether alcohol consumption affect people’s ability to concentrate or remember.


Part 3: Exploratory data analysis

Research quesion 1:

#select variables for analysis
fruitHlth <- select(brfss2013,fruit1,genhlth)

#filter null values, values as "NA" and values that do not fall into a specific category stated in the Codebook
fruitHlth <- na.omit(fruitHlth)
fruitHlth <- subset(fruitHlth,fruit1!=2)

#recode fruit1 values using value labels provided in the Codebook 
fruitHlth$fruit1[fruitHlth$fruit1==0] <- "Never"
fruitHlth$fruit1[fruitHlth$fruit1==300] <- "Less than one/month"
fruitHlth$fruit1[fruitHlth$fruit1>=301 & fruitHlth$fruit1 <=399] <- "Times/month"
fruitHlth$fruit1[fruitHlth$fruit1>=201 & fruitHlth$fruit1 <=299] <- "Times/week"
fruitHlth$fruit1[fruitHlth$fruit1>=101 & fruitHlth$fruit1 <=199] <- "Times/day"

#summary statistics
table(fruitHlth)
##                      genhlth
## fruit1                Excellent Very good  Good  Fair  Poor
##   Less than one/month       161       337   363   186    93
##   Never                    2230      4018  5850  3604  2071
##   Times/day               42963     72240 60619 24587 10030
##   Times/month             18404     39679 39380 18428  7749
##   Times/week              15871     33625 33282 14794  5697
#plot1: frequency of eating fruit
ggplot(fruitHlth,aes(x=fruit1))+geom_bar()+xlab("frequency")+ggtitle("Frequency of Eating Fruits")

#plot2: frequency of eating fruit & health status 
ggplot(fruitHlth,aes(x=fruit1,fill=factor(genhlth)))+geom_bar(position = "fill")+xlab("frequency")+ylab("percentage")+ggtitle("Frequency of Eating Fruits & Health Status") + scale_fill_discrete(name="health status")

Interpretation: Most people eat fruits every day. For these people, they are more likely to enjoy excellent health status than people with other fruit eating frequencies. For people who never eat fruits, they are more likely to be in poor health condition than others.

Research quesion 2:

#filter null values and create summary statisctics table (mean of sleep time and mean of mental not good days, grouped by marital status)
sleep_mari_mental <- select(brfss2013,sleptim1,marital,menthlth)
sleep_mari_mental <- na.omit(sleep_mari_mental)
sleep_mari_mental <- sleep_mari_mental %>%
  group_by(marital) %>%
  summarise(mean(sleptim1),mean(menthlth)) %>%
  data.frame()
sleep_mari_mental
##                           marital mean.sleptim1. mean.menthlth.
## 1                         Married       7.083589       2.564756
## 2                        Divorced       6.908390       4.753517
## 3                         Widowed       7.222120       3.026592
## 4                       Separated       6.695169       7.124865
## 5                   Never married       7.002431       4.215164
## 6 A member of an unmarried couple       6.967522       4.461825
#rename column
sleep_mari_mental <- arrange(sleep_mari_mental,desc(mean.sleptim1.))
names(sleep_mari_mental)[1] <- paste("marital_status")
names(sleep_mari_mental)[2] <- paste("mean_sleeptime")
names(sleep_mari_mental)[3] <- paste("mean_mentalnotgood")

#create plot showing relationship between marital status and sleep time. To avoid overlapping axis labels,"A member of unmarried couple" is renamed as "one/couple".
plot_mari_slep<-ggplot(sleep_mari_mental)+geom_bar(aes(x=marital_status,y=mean_sleeptime,fill=mean_sleeptime),stat="identity",size=0.5,alpha=0.75)+coord_cartesian(ylim=c(6,8))+ylab("sleeptime")+theme_minimal()+theme(axis.title.y=element_text(margin=margin(0,20,0,0)))+scale_fill_gradient(low="lightblue",high="darkblue",name="sleep time")+scale_x_discrete(labels=c("A member of an unmarried couple"="one/couple"))

#create plot showing relationship between marital status and days of mental not good. To avoid overlapping axis labels,"A member of unmarried couple" is renamed as "one/couple".
plot_mari_ment<-ggplot(sleep_mari_mental)+geom_bar(aes(x=marital_status,y=mean_mentalnotgood,fill=mean_mentalnotgood),stat="identity",size=0.5,alpha=0.75)+ylab("days of mental not good")+theme_minimal()+theme(axis.title.y=element_text(margin = margin(0,20,0,0)))+scale_fill_gradient(low="pink",high="red",name="days of mental not good")+scale_x_discrete(labels=c("A member of an unmarried couple"="one/couple"))

#bind the two plots together for comparison
grid.newpage() 
grid.draw(rbind(ggplotGrob(plot_mari_slep),ggplotGrob(plot_mari_ment),size="last"))

Interpretation: on average, widowed people sleep more than others while separated people sleep least; married people have the fewest days of being mentally unwell while separated people have 3 more days of being mentally unwell than others. Separated people tends to have more mental problems and sleep shorter.

Research quesion 3:

#select variable for analysis and delete null values
emp_alc_dec<-select(brfss2013,employ1,avedrnk2,decide)
emp_alc_dec<-na.omit(emp_alc_dec)

#data frame for people with difficulties concerntrating or remembering
alc_emp_dyes <- emp_alc_dec %>%
  filter(decide=="Yes") %>%
  group_by(employ1) %>%
  summarise(mean(avedrnk2)) %>%
  data.frame()
names(alc_emp_dyes)[2] <- paste("avedrnk")
alc_emp_dyes$dif <- rep("Yes",len=8)

#data frame for people without difficulties concerntrating or remembering
alc_emp_dno <- emp_alc_dec %>%
  filter(decide=="No") %>%
  group_by(employ1) %>%
  summarise(mean(avedrnk2)) %>%
  data.frame()
names(alc_emp_dno)[2] <- paste("avedrnk")
alc_emp_dno$dif <- rep("No",len=8)

#combine the two data frames into one for visualization 
alc_emp_dyn <- rbind.data.frame(alc_emp_dyes,alc_emp_dno)

#bar chart to compare people with and without difficulties
ggplot(alc_emp_dyn,aes(x=employ1,y=avedrnk,fill=dif))+geom_bar(stat="identity",position="dodge")+xlab("employment status")+ylab("average drink/day in past 30 days")+scale_fill_discrete(name="difficulty")+scale_x_discrete(labels=c("employed","self-employed","noWork>=1yr","noWork<1yr","homemaker","student","retired","can't work"))

Interpretation: on average, people who have difficulties in concentrating or remembering drink more alcohol per day in the past 30 days. Students are the group of people consumed most alcohol per day while homemaker and retired people consumed much less alcolhol than other groups of people.