top of page
Data Portfolio
Census Income Prediction
using SAS EM
Model comparison for data sets varied by treatment of imbalanced target variables
In this project salaries recorded in the US Census were used to predict income. Variables were both categorical and interval. Missing values were imputed using SAS EM's imputation mode. The classification target was income and had two levels, above and below $50,000.

SAS Enterprise Miner: Text
To account for the imbalanced target false positives were penalized (cost of 3.18) in data set 1. Data set 2 used over an undersampling with ROSE's ovun_sample function.
​
A summary of model performance is shown in the table below followed by model scores for the RBF models. Models included for comparison are linear and polynomial SVMs, RBF, decision trees, and a neural network.

SAS Enterprise Miner: Image
Radial basis function models performed the best and were most accurate. The neural network models also performed well. Methods of balancing the target variable influenced sensitivity and specificity.

SAS Enterprise Miner: Image
bottom of page