Data Portfolio
PEW Research Science Issues Analysis using IBM Watson Analytics
Identification of population subsets based on science knowledge and political and views
Data Source and Description
The original PEW Science Issues dataset consists of 2,002 observations (survey responses) with 57 variables, most of which are categorical, and a case ID. For this assignment cases were selected based on having responded to the question about country satisfaction leaving 1,906 cases. Demographics (16) and survey responses (6 science, 7 country satisfaction, 11 favor/oppose) were selected as they related to science and country satisfaction, all of which are categorical. An additional discrete variable is included that indicates the number of correct answers given in response to science knowledge questions.
Data Exploration
The highest average number of correct answers by region was 4.42 for the Midwest region; the lowest average 4.07 in the Northeast. The top state was North Dakota and the lowest was Hawaii.
The top drivers for number of correct answers are religion and education level followed by marital status and income group.
The number of correct answers is nearly 2 responses higher for respondents indicating frequent internet and email use.
The most important predictive factors for predicting the number of correct answers are age, religion, and sex (34% predictive strength).
The top drivers of country satisfaction are race and political views.
The top predictors of country satisfaction are age, religion and sex (75% predictive strength).
Predicting Country Satisfaction
The most important predictor is political party (.36) followed by the comparison of the US economy to other countries (.19). 13 other factors also aid in prediction primarily variables comparing the US to other countries and views on environmental and science issues, although two have predictor importance of less than .01 (registered to vote, and religious attendance). Political party importance in determining country satisfaction is congruent with the political party differences in the US both at the time of the survey and continuing to present times.

Predicting Science Knowledge
The predictive model for number of correct answers has a much lower predictor strength (34%) than Model 1 (75%).
​
The top predictors by importance are Income Group and Level of Education followed by Religion.
The highest scorers are predicted to be top earners ($75,000 or more) who have a positive view on scientific achievements made in the US.
Respondents with education beyond a college degree are also predicted to score high. In this group, respondents under age 65 are predicted to score higher than those 65 and over. Of the under age 65 respondents those in favor of altering DNA in babies for medical advances scored higher than those who opposed.
The difference in the predicted average number of correct answers range is between .5 higher and 1 higher for non-Christians when compared to Christians.
