Is there a relationship between how often the respondent attends religious services and how, on the whole, the respondent is satisfied with the work he or she does?
Studying these variables is a matter of curiosity, because it is very interesting when two seemingly unrelated variables turn out to be actually related.
Others who may care about the answer to this question are employers questioning the cost-benefit of including worship facilities at places of employment. Alternatively, places of worship may care about the results of this study in order to better attract more worshippers to their services - for example, advertising their service schedules in areas known to have higher/lower levels of job satisfaction.
The data set used is the General Social Survey (GSS). This is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States.
The data was collected from randomly chosen, non-institutionalized, English or Spanish speaking individuals age 18 and over in the United States over the years 1972-2012.
The data is a collection of 29 surveys taken during that 40 year period. The data was collected using computer assisted personal interviews, personal interviews or telephone interviews.
At the beginning, in the years 1972 to 1974 the methods used in data collection was only semi-random. In later years, additional funding allowed the survey to produce full probability samples. Households were chosen at random, and then an adult individual was chosen at random from that household to be the respondent.
The cases in this study are individuals, people. Specifically, the cases are Americans age 18 years or over, English or Spanish speaking who do not reside in an institution.
The two variables I will be studying are attend - how often the respondent attends religious services, and satjob - how satisfied the respondent is with the work (job or housework) the respondent does. Both variables are ordinal categorical - that is, they both take values that represent levels (in this case of attendance and satisfaction)
The type of study is observational using random sampling. The supporting information on the type of study is found in Appendix A of the GSS study codebook. All respondents in any given year were asked the same questions and their responses were observed.
The population of interest is all non institutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States. Since the sampling was fairly random, the findings from this analysis can be generalized to that population.
A potential source of bias in this study can be due to oversampling some populations and undersampling others. For example, in some surveys, men in full-time employment were undersampled and so there could be some ‘non-response’ bias.
However, Appendix A of the GSS study codebook contains methods to overcome this and other types of bias so that the data is still generalizable to the populations.
These data cannot be used to establish causal links between the variables of interest. This is because causal links can only be established if the subjects of the study are randomly assigned to experimental and control groups. In this study, some form of random sampling was used, but there was no random assigments.
Summary statistics are shown in the following table. The number of people who responded to the two questions included in this study is n = 34393.
Very Satisfied | Mod. Satisfied | A Little Dissat | Very Dissatisfied | |
---|---|---|---|---|
Never | 0 | 0 | 0 | 0 |
Lt Once A Year | 1368 | 1344 | 379 | 161 |
Once A Year | 2480 | 2294 | 670 | 255 |
Sevrl Times A Yr | 2537 | 2128 | 542 | 229 |
Once A Month | 1460 | 1227 | 285 | 106 |
2-3X A Month | 1768 | 1497 | 396 | 148 |
Nrly Every Week | 1093 | 798 | 187 | 93 |
Every Week | 4282 | 2789 | 617 | 216 |
More Thn Once Wk | 1763 | 978 | 216 | 87 |
The following mosaic plot visualizes the two variables to be studied. For illustrative purposes, frequency of attendance is shown on the x-axis as explanatory variable and job satisfaction is shown on the y-axis as response variable. Since we are studying correlation, it does not matter which variable is on which axis.
The exploratory data analysis seems to suggest that there is a positive correlation between frequency of attendance at religious services and job satisfaction. It seems, that the more a respondent is satisfied with his/her job, the more likely he/she is to attend religious services frequently (or the other way around).
The null hypotheses is that frequency of attendance at religious services is independent of job satisfaction. The alternative hypotheses is that frequency of attendance at religious services and job satisfaction are dependent variables and contrary to the null hypotheses are somehow related.
The following conditions for the chi-square test are met:
Sampling was without replacement and n < 10% of population. Without replacement means, that once a person was randomly chosen to be interviewed, he/she would not be a candidate to be randomly chosen again. n < 10% of population means that the number of respondents to these two questions (34393) is less than 10% of the target population. Assuming US population of 300 million, even if our target population is half that, 34393 is still much less than 10% of the target population.
Each case only contributes to one cell in the table. This condition is ensured by the nature of the questions - multiple choice with one answer. For example, a respondent cannot both be ‘very satisfied’ and ‘a little dissatisfied’ with his/her job.
The method of inference to be used in this study is the chi-square independence hypotheses test. This method was chosen as it is a simple and reliable way to determine whether two categorical variables, one of which has more than 2 levels, are dependant or not.
The chi-squared method works for hypotheses testing only. In this case the null hypotheses of no dependence between variables is assumed. On this assumption, the expected counts for each cell are calculated by multiplying the row total by the column total and dividing by the table total. This result is thus the number of cases one would expect to find in that cell were the variables independent.
The chi-squared test of independence statistic is then calculated by summing up the differences between observed values and expected values in each cell. The summation formula below is designed to accentuate the larger differences while scaling the sum by dividing each part by the expected value.
\(\chi^2 = \sum_{i=1}^k \frac{(O-E)^2}{E}\)
Where 0 is the observed proportion value, E is the expected proportion value and k is the total number of cells in table
The distribution of the chi-squared statistic is a right-skewed curve, where the skew is determined by the degrees of freedom parameter, \(df\) given by the following formula:
\(df = (R-1) * (C-1)\)
Where R is the number of rows in the table and C is the number of columns in the table
In the next section an R function called ‘inference’ will calculate the chi-squared test of independence which is a point on the x-axis of the distribution curve. The function will then calculate the corresponding p-value which is the upper tail area under the distribution curve.
source("http://bit.ly/dasi_inference")
inference(y = droplevels(gss$attend), x = gss$satjob, est = "proportion", type = "ht", alternative = "greater", method = "theoretical", eda_plot=FALSE, inf_plot=TRUE, sum_stats=FALSE)
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
## x
## y Very Satisfied Mod. Satisfied A Little Dissat
## Lt Once A Year 1583.88 1234.40 311.27
## Once A Year 2775.68 2163.24 545.49
## Sevrl Times A Yr 2647.59 2063.41 520.32
## Once A Month 1499.13 1168.36 294.62
## 2-3X A Month 1855.16 1445.83 364.59
## Nrly Every Week 1057.38 824.07 207.80
## Every Week 3849.62 3000.22 756.55
## More Thn Once Wk 1482.57 1155.45 291.36
## x
## y Very Dissatisfied
## Lt Once A Year 122.45
## Once A Year 214.58
## Sevrl Times A Yr 204.68
## Once A Month 115.90
## 2-3X A Month 143.42
## Nrly Every Week 81.74
## Every Week 297.61
## More Thn Once Wk 114.62
##
## Pearson's Chi-squared test
##
## data: y_table
## X-squared = 389.4298, df = 21, p-value < 2.2e-16
The chi-square value at more than \(389\) is very large and thus the p-value is tiny - less than \(0.00000000000000022\) ! Since the p-value is almost zero, the null hypotheses can be rejected. That is, we reject the hypotheses that attendance at religious services and job satisfaction are independent.
Rephrasing the obtained p-value as a conditional probability, the probability of obtaining a random sample of 34393 people showing that the variables ‘attend’ and ‘satjob’ are dependent when in fact the variables are independent is almost zero.
The probability of committing a type I error, concluding that the variables are dependent when in truth they are not, is almost zero as well.
Aside from chi-square hypothesis testing, no other methods were applicable. Hence there is no confidence interval to compare with these results.
This study has shown a strong correlation between the frequency of attendance at religious services and job satisfaction among English and Spanish speaking Americans aged 18 years and over.
Note however, that correlation does not imply causality. There may well be other confounding variables that reflect dependence between these two studied variables.
While studying these variables I noticed that the correlation between job satisfaction and religious attendance seems different for different religions. I also noticed that regardless of religion or religious observance, a large proportion were satisfied with their job. Also, intriguingly none of the respondents answered ‘never’ to the question of frequency of attendance at religious services.
Future research could be done to detect if and what the confounding variables are between job satisfaction and religious observance. Additionally it could be interesting to see if the dependence between these variables actually holds across all different religions. Finally, another study could be done to compare the job satisfaction differences among the more irreligious sections of the population.
Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1
The link to download the R data set is here The above data set is an extract from the full General Social Survey (GSS) Cumulative File 1972-2012 which was modified for the Coursera Data Analysis and Statistical Inference course.
The link to the GSS study codebook is here.
This appendix shows one page of the data and displays the variables used in the project. Note that here the data is not cleaned, there are cells with ‘NA’. However NA values were ignored while doing the inference.
## attend satjob
## 1 Once A Year A Little Dissat
## 2 Every Week <NA>
## 3 Once A Month Mod. Satisfied
## 4 <NA> Very Satisfied
## 5 <NA> <NA>
## 6 Once A Year Mod. Satisfied
## 7 Every Week Very Satisfied
## 8 <NA> A Little Dissat
## 9 Sevrl Times A Yr Mod. Satisfied
## 10 More Thn Once Wk Mod. Satisfied
## 11 Every Week <NA>
## 12 Every Week Very Satisfied
## 13 Every Week Very Satisfied
## 14 Every Week Mod. Satisfied
## 15 More Thn Once Wk Very Satisfied
## 16 2-3X A Month Mod. Satisfied
## 17 Every Week <NA>
## 18 Every Week <NA>
## 19 Every Week Very Satisfied
## 20 2-3X A Month <NA>
## 21 2-3X A Month Very Satisfied
## 22 Once A Year Very Satisfied
## 23 Lt Once A Year Mod. Satisfied
## 24 <NA> <NA>
## 25 Sevrl Times A Yr <NA>
## 26 Every Week Mod. Satisfied
## 27 Once A Year Mod. Satisfied
## 28 Once A Year A Little Dissat
## 29 Sevrl Times A Yr Mod. Satisfied
## 30 Every Week Very Satisfied
## 31 Sevrl Times A Yr Mod. Satisfied
## 32 Every Week Very Satisfied
## 33 Every Week <NA>
## 34 Every Week <NA>
## 35 2-3X A Month Mod. Satisfied
## 36 Sevrl Times A Yr Mod. Satisfied
## 37 Nrly Every Week <NA>
## 38 More Thn Once Wk <NA>
## 39 Once A Year Mod. Satisfied
## 40 More Thn Once Wk Mod. Satisfied