SMOTE-N algorithm to handle imbalanced data

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

SMOTE-N algorithm to handle imbalanced data

Posted by: Kalia

Date: June 02, 2014 08:10AM

Hi all,

I have a classification problem with two classes working on nominal data. I want to apply SMOTE-N to deal with imbalanced data. However, it is not clear to me how to use SMOTE-N for generating N synthetic data for each feature vector in the minority class. SMOTE-N uses a modified version of the value difference metric (VDM) to find the k-nearest neighbors for each feature vector in the minority class and then the new minority class feature vector is generated by creating new set feature values by taking the majority vote of the feature vector in consideration and its k nearest neighbors (k-nn). But, how is this process repeated to generate multiple synthetic feature vectors for each feature vector in the minority class? The way the algorithm is stated, it seems that one feature vector from the minority class can generate only one synthetic feature vector (using its K-nn)

Thank you in advance,

Kalia

Options: Reply•Quote