Retaining class label while classification in r and python
Posted by: immahin
Date: September 09, 2018 07:45AM

0
down vote
favorite

I need an explanation about a certain matter. While data classification process in python, before training the classifier data is divided into training samples where the label class is the one we are going to identify is removed before training. For example, for classifying yeast data, where "class" is the label, I do things this way:

headers = ["name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"]
df = pd.read_csv("yeast.data", header=None, names=headers, na_values="?"winking smiley
X = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
knn = NearestNeighbors(n_neighbors=6, algorithm='ball_tree', metric='euclidean')
knn.fit(X)


However, in the R language, while distance calculation the label class is also considered. For example for the same dataset, while distance calculation in R, things goes this way:

df <- read.table(file="~/yeast.txt",header=T, sep=","winking smiley
names(df) <- c("name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"winking smiley
dist <- distances("class",df, "Euclidean"winking smiley


Here, we are needed to add the label class too. Could someone explain me the reason? Am I doing something wrong?

Re: Retaining class label while classification in r and python
Posted by: Dang Nguyen
Date: September 11, 2018 03:12PM

Given a data point, if you just want to find k its nearest neighbors, then you don't need to use labels.

In case of your R code, I think the result would be the same as the Python code if you removed the labels in your training data.

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.