Introduction:

Research Question: “Is the level of peoples’ education effective to their confidence in finding an equally good job with another employer?”

Usually the higher is someone’s education level, the higher is his confidence in his abilities. We will examine if level of education can make people feel more confidence in finding another job with the same profits as the one they have now. On one hand, I expect that people with no college studies have a lower confidence in their abilities and therefore, they will not look easily for another equally good job elsewhere. On the other hand, I expect graduates to be more stuck in finding a new job because of their knowledge level.

Data:

Data collection: General Social Survey took a random sample from residents of the United States and set an amount of questions to respondents about their attitudes on several issues, such as national spending priorities, marijuana use, crime and punishment, race relations, quality of life, and confidence in institutions, as well as about demographic characteristics. The mode of data collection was computer-assisted personal interview, face-to-face interview, and telephone interview.

Cases: Specific individuals were observed, both English and Spanish speaking, 18 years of age or older, living in the United States. These cases are observational and retrospective as the data was generated via a question/answer survey requesting information about current and past events.

Variables: Both variables are categorical.

degree: Is a categorical ordinal variable (it has ordered levels), and refers to respondents’ hightest degree in education.

jobfind: It is also a categorical ordinal variable (it has ordered levels). The original survey question associated with this variable is how easy would it be for the respondent to find a job with another employer with approximately the same income and fringe benefits he now has.

Study: Since the data are collected in a way that does not directly interfere with how the data arise, and the researchers merely observe, our study is observational. Subjects were collected randomly from the population. Additionally, there is no control or treatment group that has been assigned by an experiment controller.

Scope of inference - generalizability: The desired population of interest is all people in the United States, but with the limitations of the survey, we are only able to look at a sample that represents those people 18 and over who speak either English or Spanish, living within the United States at the time of the individual surveys. Due to the nature of how the survey data was combined over multiple different years with each survey not being exactly the same questions and format, the non-response bias is quite strong in some areas. For questions not asked in prior years, there will be a strongly implied non-response bias for data comparing across years. However the sample is still representative and the results are generalizable to the population of interest, but we shall be careful with those sources of bias pointed out.

Scope of inference - causality: These data cannot be used to establish causal links between the variables of this study, because it’s an observational study.

Exploratory data analysis:

We make a subset of gss dataset for 2 variables of interest

sgss = subset(gss, select = c(degree, jobfind))

We exclude NA values

sgss_na = na.omit(sgss)

Dimensions of new subset without NA values

dim(sgss_na)
## [1] 18518     2

The variables of interest

names(sgss_na)
## [1] "degree"  "jobfind"

Frequency tables

table(sgss_na$degree)
## 
## Lt High School    High School Junior College       Bachelor       Graduate 
##           2194          10025           1333           3281           1685
table(sgss_na$jobfind)
## 
##     Very Easy Somewhat Easy      Not Easy 
##          4829          6088          7601

Relative frequency tables

table(sgss_na$degree)/18518
## 
## Lt High School    High School Junior College       Bachelor       Graduate 
##     0.11847932     0.54136516     0.07198402     0.17717896     0.09099255
table(sgss_na$jobfind)/18518
## 
##     Very Easy Somewhat Easy      Not Easy 
##     0.2607733     0.3287612     0.4104655

Barplots

barplot(table(sgss_na$degree))

barplot(table(sgss_na$jobfind))

barplot(table(sgss_na$degree)/18518)

barplot(table(sgss_na$jobfind)/18518)

Mosaicplot for the two categorical variables

mosaicplot(table(sgss_na$degree, sgss_na$jobfind))

Displays the feelings about easiness of finding an equally good job with another employer, conditional on highest degree rate as well.

“Relative frequency segmented bar plot visualizing the two categorical variables”

plot(sgss_na$degree, sgss_na$jobfind)

This plot visualizes the relative frequencies of jobfind variable, for each of the five categories of degree variable. For example for those respondents with High School highest rate, roughly 25% find it very easy to find an equally good job with another employer, roughly 32% find it somewhat easy, and roughly 43% find it not easy. As we can see clearly, as the highest degree rate of respondents increases, the proportion of those who find it not easy to find an equally good job decreases slightly. That make us believe that may the level of respondents’ education effect their confidence in finding another equally good job.

The exploratory data analysis suggests that probably the level of peoples’ education is effective on their confidence in finding an equally good job with another employer.

Inferenerce:

Here we have a contigency table of the two variables

table(sgss_na$degree, sgss_na$jobfind)
##                 
##                  Very Easy Somewhat Easy Not Easy
##   Lt High School       496           542     1156
##   High School         2562          3164     4299
##   Junior College       383           459      491
##   Bachelor             902          1292     1087
##   Graduate             486           631      568

We want to check if there is a relation between the level of education of people and their confidence in finding an equally good job, so we set a hypothesis type of the form:

Ho: “There is nothing going on here”. Education level and confidence in finding a new job are independent. Confidence in finding a new job do not vary by education level.

HA: “There is something going on here”. Education level and confidence in finding a new job are dependent. Confidence in finding a new job do vary by education level.

We want to evaluate the relationship between two categorical variables that at least one has more than two levels, so we will do a chi-square indipendence test.

We will check the conditions for the chi-square test:

  1. Indipendence: We have a random sample, and the sample size is less than 10% of the population. Also each case contributes to one cell in the table, so the sampled observations are independent.

  2. Sample size: each particular cell has at least 5 expected cases, so the second conditions is met also.

We will calculate the proportions of the three levels of jobfind variable

proportionVeryEasy = nrow(subset(sgss_na, jobfind == "Very Easy"))/nrow(sgss_na)
proportionSomewhatEasy = nrow(subset(sgss_na, jobfind == "Somewhat Easy"))/nrow(sgss_na)
proportionNotEasy = nrow(subset(sgss_na, jobfind == "Not Easy"))/nrow(sgss_na)

Now we will calculate the expected numbers of people in each level of the degree variable for those who responded Very Easy in the jobfind variable.

nrow(subset(sgss_na, degree == "Lt High School"))*proportionVeryEasy
## [1] 572.1366
nrow(subset(sgss_na, degree == "High School"))*proportionVeryEasy
## [1] 2614.252
nrow(subset(sgss_na, degree == "Junior College"))*proportionVeryEasy
## [1] 347.6108
nrow(subset(sgss_na, degree == "Bachelor"))*proportionVeryEasy
## [1] 855.5972
nrow(subset(sgss_na, degree == "Graduate"))*proportionVeryEasy
## [1] 439.403

Now we will calculate the expected numbers of people in each level of the degree variable for those who responded Somewhat Easy in the jobfind variable.

nrow(subset(sgss_na, degree == "Lt High School"))*proportionSomewhatEasy
## [1] 721.3021
nrow(subset(sgss_na, degree == "High School"))*proportionSomewhatEasy
## [1] 3295.831
nrow(subset(sgss_na, degree == "Junior College"))*proportionSomewhatEasy
## [1] 438.2387
nrow(subset(sgss_na, degree == "Bachelor"))*proportionSomewhatEasy
## [1] 1078.666
nrow(subset(sgss_na, degree == "Graduate"))*proportionSomewhatEasy
## [1] 553.9626

Finally we will calculate the expected numbers of people in each level of the degree variable for those who responded Not Easy in the jobfind variable.

nrow(subset(sgss_na, degree == "Lt High School"))*proportionNotEasy
## [1] 900.5613
nrow(subset(sgss_na, degree == "High School"))*proportionNotEasy
## [1] 4114.917
nrow(subset(sgss_na, degree == "Junior College"))*proportionNotEasy
## [1] 547.1505
nrow(subset(sgss_na, degree == "Bachelor"))*proportionNotEasy
## [1] 1346.737
nrow(subset(sgss_na, degree == "Graduate"))*proportionNotEasy
## [1] 691.6344

Now we will calculate the test statistic:

chisquare = ((496-(nrow(subset(sgss_na, degree == "Lt High School"))*proportionVeryEasy))^2)/nrow(subset(sgss_na, degree == "Lt High School"))*proportionVeryEasy + ((2562-(nrow(subset(sgss_na, degree == "High School"))*proportionVeryEasy))^2)/nrow(subset(sgss_na, degree == "High School"))*proportionVeryEasy + ((383-(nrow(subset(sgss_na, degree == "Junior College"))*proportionVeryEasy))^2)/nrow(subset(sgss_na, degree == "Junior College"))*proportionVeryEasy + ((902-(nrow(subset(sgss_na, degree == "Bachelor"))*proportionVeryEasy))^2)/nrow(subset(sgss_na, degree == "Bachelor"))*proportionVeryEasy + ((486-(nrow(subset(sgss_na, degree == "Graduate"))*proportionVeryEasy))^2)/nrow(subset(sgss_na, degree == "Graduate"))*proportionVeryEasy + ((542-(nrow(subset(sgss_na, degree == "Lt High School"))*proportionSomewhatEasy))^2)/nrow(subset(sgss_na, degree == "Lt High School"))*proportionSomewhatEasy + ((3164-(nrow(subset(sgss_na, degree == "High School"))*proportionSomewhatEasy))^2)/nrow(subset(sgss_na, degree == "High School"))*proportionSomewhatEasy + ((459-(nrow(subset(sgss_na, degree == "Junior College"))*proportionSomewhatEasy))^2)/nrow(subset(sgss_na, degree == "Junior College"))*proportionSomewhatEasy + ((1292-(nrow(subset(sgss_na, degree == "Bachelor"))*proportionSomewhatEasy))^2)/nrow(subset(sgss_na, degree == "Bachelor"))*proportionSomewhatEasy + ((631-(nrow(subset(sgss_na, degree == "Graduate"))*proportionSomewhatEasy))^2)/nrow(subset(sgss_na, degree == "Graduate"))*proportionSomewhatEasy + ((1156-(nrow(subset(sgss_na, degree == "Lt High School"))*proportionNotEasy))^2)/nrow(subset(sgss_na, degree == "Lt High School"))*proportionNotEasy + ((4299-(nrow(subset(sgss_na, degree == "High School"))*proportionNotEasy))^2)/nrow(subset(sgss_na, degree == "High School"))*proportionNotEasy + ((491-(nrow(subset(sgss_na, degree == "Junior College"))*proportionNotEasy))^2)/nrow(subset(sgss_na, degree == "Junior College"))*proportionNotEasy + ((1087-(nrow(subset(sgss_na, degree == "Bachelor"))*proportionNotEasy))^2)/nrow(subset(sgss_na, degree == "Bachelor"))*proportionNotEasy + ((568-(nrow(subset(sgss_na, degree == "Graduate"))*proportionNotEasy))^2)/nrow(subset(sgss_na, degree == "Graduate"))*proportionNotEasy
chisquare
## [1] 39.45303

Now we calculate the degrees of freedom:

df = (5-1)*(3-1)
df
## [1] 8

Finally we will calculate the p-value

pchisq(chisquare, df, lower.tail = FALSE)
## [1] 4.049766e-06

We find a pretty small p-value and so we reject the null hypothesis in favor of the alternative, which means that these data provide convincing evidence that level of education and confidence in finding an equally good job, are associated.

It is possible that there is a causal relation between the two variables, but the type of analysis that we conducted here is simply not sufficient to deduce a causal relationship.

Conclusion:

In conclusion, the study here is an observational study, we didn’t do an experiment but we simply observed. We checked the relation of two categorical variables that both have more than two levels, with the chi-square test of indipendence. We rejected the null hypothesis we did earlier, because of a very low p-value, and hence we did find a relation between these two variables.

We find that, as higher is the level of people’s education, so more increases their confidence in finding another equally good job, and this is logic because someone with high knowledge in his working field would be more confident than someone with no basic education on a specific field.

On the future, one possible idea for research is to check also if there are other factors that may effect the people’s confidence in finding a job. One factor could be the years of experience in this job, which i guess that will play a role in their cinfidence in finding a new job elsewhere.