Introduction

Is there a relationship between how often the respondent attends religious services and how, on the whole, the respondent is satisfied with the work he or she does?

Studying these variables is a matter of curiosity, because it is very interesting when two seemingly unrelated variables turn out to be actually related.

Others who may care about the answer to this question are employers questioning the cost-benefit of including worship facilities at places of employment. Alternatively, places of worship may care about the results of this study in order to better attract more worshippers to their services - for example, advertising their service schedules in areas known to have higher/lower levels of job satisfaction.

Data

The data set used is the General Social Survey (GSS). This is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States.

Data Collection

The data was collected from randomly chosen, non-institutionalized, English or Spanish speaking individuals age 18 and over in the United States over the years 1972-2012.

The data is a collection of 29 surveys taken during that 40 year period. The data was collected using computer assisted personal interviews, personal interviews or telephone interviews.

At the beginning, in the years 1972 to 1974 the methods used in data collection was only semi-random. In later years, additional funding allowed the survey to produce full probability samples. Households were chosen at random, and then an adult individual was chosen at random from that household to be the respondent.

Cases

The cases in this study are individuals, people. Specifically, the cases are Americans age 18 years or over, English or Spanish speaking who do not reside in an institution.

Variables

The two variables I will be studying are attend - how often the respondent attends religious services, and satjob - how satisfied the respondent is with the work (job or housework) the respondent does. Both variables are ordinal categorical - that is, they both take values that represent levels (in this case of attendance and satisfaction)

Type of Study

The type of study is observational using random sampling. The supporting information on the type of study is found in Appendix A of the GSS study codebook. All respondents in any given year were asked the same questions and their responses were observed.

Scope of Inference - Generalizability

The population of interest is all non institutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States. Since the sampling was fairly random, the findings from this analysis can be generalized to that population.

A potential source of bias in this study can be due to oversampling some populations and undersampling others. For example, in some surveys, men in full-time employment were undersampled and so there could be some ‘non-response’ bias.

However, Appendix A of the GSS study codebook contains methods to overcome this and other types of bias so that the data is still generalizable to the populations.

Scope of Inference - Causality

These data cannot be used to establish causal links between the variables of interest. This is because causal links can only be established if the subjects of the study are randomly assigned to experimental and control groups. In this study, some form of random sampling was used, but there was no random assigments.

Exploratory data analysis

Summary Statistics

Summary statistics are shown in the following table. The number of people who responded to the two questions included in this study is n = 34393.

Very Satisfied Mod. Satisfied A Little Dissat Very Dissatisfied
Never 0 0 0 0
Lt Once A Year 1368 1344 379 161
Once A Year 2480 2294 670 255
Sevrl Times A Yr 2537 2128 542 229
Once A Month 1460 1227 285 106
2-3X A Month 1768 1497 396 148
Nrly Every Week 1093 798 187 93
Every Week 4282 2789 617 216
More Thn Once Wk 1763 978 216 87
Data visualization

The following mosaic plot visualizes the two variables to be studied. For illustrative purposes, frequency of attendance is shown on the x-axis as explanatory variable and job satisfaction is shown on the y-axis as response variable. Since we are studying correlation, it does not matter which variable is on which axis.

Preliminary Findings

The exploratory data analysis seems to suggest that there is a positive correlation between frequency of attendance at religious services and job satisfaction. It seems, that the more a respondent is satisfied with his/her job, the more likely he/she is to attend religious services frequently (or the other way around).

Inference

Hypotheses

The null hypotheses is that frequency of attendance at religious services is independent of job satisfaction. The alternative hypotheses is that frequency of attendance at religious services and job satisfaction are dependent variables and contrary to the null hypotheses are somehow related.

Conditions for Inference

The following conditions for the chi-square test are met:

  1. Independence: Each respondent’s answers to the question of religious service attendance and job satisfaction can be assumed to be independent of each other because:
    • The survey was random.
    • Sampling was without replacement and n < 10% of population. Without replacement means, that once a person was randomly chosen to be interviewed, he/she would not be a candidate to be randomly chosen again. n < 10% of population means that the number of respondents to these two questions (34393) is less than 10% of the target population. Assuming US population of 300 million, even if our target population is half that, 34393 is still much less than 10% of the target population.

    • Each case only contributes to one cell in the table. This condition is ensured by the nature of the questions - multiple choice with one answer. For example, a respondent cannot both be ‘very satisfied’ and ‘a little dissatisfied’ with his/her job.

  2. Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases. In this study all cells have more than 5 cases except the row labeled ‘Never’. However, since there are no respondents at all, across all the satjob categories who answered that they had ‘never’ been to a religious service, this row may be removed without affecting the hypotheses testing. Please refer to the Inference section of this paper to see the cleaned data table with the ‘never’ row removed.
Methods for Inference

The method of inference to be used in this study is the chi-square independence hypotheses test. This method was chosen as it is a simple and reliable way to determine whether two categorical variables, one of which has more than 2 levels, are dependant or not.

The chi-squared method works for hypotheses testing only. In this case the null hypotheses of no dependence between variables is assumed. On this assumption, the expected counts for each cell are calculated by multiplying the row total by the column total and dividing by the table total. This result is thus the number of cases one would expect to find in that cell were the variables independent.

The chi-squared test of independence statistic is then calculated by summing up the differences between observed values and expected values in each cell. The summation formula below is designed to accentuate the larger differences while scaling the sum by dividing each part by the expected value.

\(\chi^2 = \sum_{i=1}^k \frac{(O-E)^2}{E}\)

Where 0 is the observed proportion value, E is the expected proportion value and k is the total number of cells in table

The distribution of the chi-squared statistic is a right-skewed curve, where the skew is determined by the degrees of freedom parameter, \(df\) given by the following formula:

\(df = (R-1) * (C-1)\)

Where R is the number of rows in the table and C is the number of columns in the table

In the next section an R function called ‘inference’ will calculate the chi-squared test of independence which is a point on the x-axis of the distribution curve. The function will then calculate the corresponding p-value which is the upper tail area under the distribution curve.

Performing the Inference
source("http://bit.ly/dasi_inference")
inference(y = droplevels(gss$attend), x = gss$satjob, est = "proportion", type = "ht", alternative = "greater", method = "theoretical", eda_plot=FALSE, inf_plot=TRUE, sum_stats=FALSE)
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
##                   x
## y                  Very Satisfied Mod. Satisfied A Little Dissat
##   Lt Once A Year          1583.88        1234.40          311.27
##   Once A Year             2775.68        2163.24          545.49
##   Sevrl Times A Yr        2647.59        2063.41          520.32
##   Once A Month            1499.13        1168.36          294.62
##   2-3X A Month            1855.16        1445.83          364.59
##   Nrly Every Week         1057.38         824.07          207.80
##   Every Week              3849.62        3000.22          756.55
##   More Thn Once Wk        1482.57        1155.45          291.36
##                   x
## y                  Very Dissatisfied
##   Lt Once A Year              122.45
##   Once A Year                 214.58
##   Sevrl Times A Yr            204.68
##   Once A Month                115.90
##   2-3X A Month                143.42
##   Nrly Every Week              81.74
##   Every Week                  297.61
##   More Thn Once Wk            114.62
## 
##  Pearson's Chi-squared test
## 
## data:  y_table
## X-squared = 389.4298, df = 21, p-value < 2.2e-16

Interpretation of Results

The chi-square value at more than \(389\) is very large and thus the p-value is tiny - less than \(0.00000000000000022\) ! Since the p-value is almost zero, the null hypotheses can be rejected. That is, we reject the hypotheses that attendance at religious services and job satisfaction are independent.

Rephrasing the obtained p-value as a conditional probability, the probability of obtaining a random sample of 34393 people showing that the variables ‘attend’ and ‘satjob’ are dependent when in fact the variables are independent is almost zero.

The probability of committing a type I error, concluding that the variables are dependent when in truth they are not, is almost zero as well.

Aside from chi-square hypothesis testing, no other methods were applicable. Hence there is no confidence interval to compare with these results.

Conclusion

This study has shown a strong correlation between the frequency of attendance at religious services and job satisfaction among English and Spanish speaking Americans aged 18 years and over.

Note however, that correlation does not imply causality. There may well be other confounding variables that reflect dependence between these two studied variables.

While studying these variables I noticed that the correlation between job satisfaction and religious attendance seems different for different religions. I also noticed that regardless of religion or religious observance, a large proportion were satisfied with their job. Also, intriguingly none of the respondents answered ‘never’ to the question of frequency of attendance at religious services.

Future research could be done to detect if and what the confounding variables are between job satisfaction and religious observance. Additionally it could be interesting to see if the dependence between these variables actually holds across all different religions. Finally, another study could be done to compare the job satisfaction differences among the more irreligious sections of the population.

References

Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1

The link to download the R data set is here The above data set is an extract from the full General Social Survey (GSS) Cumulative File 1972-2012 which was modified for the Coursera Data Analysis and Statistical Inference course.

The link to the GSS study codebook is here.

Appendix

This appendix shows one page of the data and displays the variables used in the project. Note that here the data is not cleaned, there are cells with ‘NA’. However NA values were ignored while doing the inference.

##              attend          satjob
## 1       Once A Year A Little Dissat
## 2        Every Week            <NA>
## 3      Once A Month  Mod. Satisfied
## 4              <NA>  Very Satisfied
## 5              <NA>            <NA>
## 6       Once A Year  Mod. Satisfied
## 7        Every Week  Very Satisfied
## 8              <NA> A Little Dissat
## 9  Sevrl Times A Yr  Mod. Satisfied
## 10 More Thn Once Wk  Mod. Satisfied
## 11       Every Week            <NA>
## 12       Every Week  Very Satisfied
## 13       Every Week  Very Satisfied
## 14       Every Week  Mod. Satisfied
## 15 More Thn Once Wk  Very Satisfied
## 16     2-3X A Month  Mod. Satisfied
## 17       Every Week            <NA>
## 18       Every Week            <NA>
## 19       Every Week  Very Satisfied
## 20     2-3X A Month            <NA>
## 21     2-3X A Month  Very Satisfied
## 22      Once A Year  Very Satisfied
## 23   Lt Once A Year  Mod. Satisfied
## 24             <NA>            <NA>
## 25 Sevrl Times A Yr            <NA>
## 26       Every Week  Mod. Satisfied
## 27      Once A Year  Mod. Satisfied
## 28      Once A Year A Little Dissat
## 29 Sevrl Times A Yr  Mod. Satisfied
## 30       Every Week  Very Satisfied
## 31 Sevrl Times A Yr  Mod. Satisfied
## 32       Every Week  Very Satisfied
## 33       Every Week            <NA>
## 34       Every Week            <NA>
## 35     2-3X A Month  Mod. Satisfied
## 36 Sevrl Times A Yr  Mod. Satisfied
## 37  Nrly Every Week            <NA>
## 38 More Thn Once Wk            <NA>
## 39      Once A Year  Mod. Satisfied
## 40 More Thn Once Wk  Mod. Satisfied