The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Retaining class label while classification in r and python
Posted by: immahin
Date: September 09, 2018 07:45AM

0
down vote
favorite

I need an explanation about a certain matter. While data classification process in python, before training the classifier data is divided into training samples where the label class is the one we are going to identify is removed before training. For example, for classifying yeast data, where "class" is the label, I do things this way:

headers = ["name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"]
df = pd.read_csv("yeast.data", header=None, names=headers, na_values="?"winking smiley
X = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
knn = NearestNeighbors(n_neighbors=6, algorithm='ball_tree', metric='euclidean')
knn.fit(X)


However, in the R language, while distance calculation the label class is also considered. For example for the same dataset, while distance calculation in R, things goes this way:

df <- read.table(file="~/yeast.txt",header=T, sep=","winking smiley
names(df) <- c("name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"winking smiley
dist <- distances("class",df, "Euclidean"winking smiley


Here, we are needed to add the label class too. Could someone explain me the reason? Am I doing something wrong?

Options: ReplyQuote
Re: Retaining class label while classification in r and python
Posted by: Dang Nguyen
Date: September 11, 2018 03:12PM

Given a data point, if you just want to find k its nearest neighbors, then you don't need to use labels.

In case of your R code, I think the result would be the same as the Python code if you removed the labels in your training data.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.