SMOTE-N algorithm to handle imbalanced data

Posted by: Kalia

Date: June 02, 2014 08:10AM

Hi all,

I have a classification problem with two classes working on nominal data. I want to apply SMOTE-N to deal with imbalanced data. However, it is not clear to me how to use SMOTE-N for generating N synthetic data for each feature vector in the minority class. SMOTE-N uses a modified version of the value difference metric (VDM) to find the k-nearest neighbors for each feature vector in the minority class and then the new minority class feature vector is generated by creating new set feature values by taking the majority vote of the feature vector in consideration and its k nearest neighbors (k-nn). But, how is this process repeated to generate multiple synthetic feature vectors for each feature vector in the minority class? The way the algorithm is stated, it seems that one feature vector from the minority class can generate only one synthetic feature vector (using its K-nn)

Thank you in advance,

Kalia