top of page

Census Income Prediction
using SAS EM

Model comparison for data sets varied by treatment of imbalanced target variables

In this project salaries recorded in the US Census were used to predict income. Variables were both categorical and interval. Missing values were imputed using SAS EM's imputation mode. The classification target was income and had two levels, above and below $50,000.

SAS_EM_SVM_Model_Diagram_edited.jpg
SAS Enterprise Miner: Text

To account for the imbalanced target false positives were penalized (cost of 3.18) in data set 1. Data set 2 used over an undersampling with ROSE's ovun_sample function.

​

A summary of model performance is shown in the table below followed by model scores for the RBF models. Models included for comparison are linear and polynomial SVMs, RBF, decision trees, and a neural network.

SAS_EM_SVM_Model_Comparison_edited.jpg
SAS Enterprise Miner: Image

Radial basis function models performed the best and were most accurate. The neural network models also performed well. Methods of balancing the target variable influenced sensitivity and specificity.

SAS_EM_SVM_Model_Scores.png
SAS Enterprise Miner: Image

617-751-2800

©2021 by Laura Ellis. Proudly created with Wix.com

bottom of page