Re: Retaining class label while classification in r and python

Retaining class label while classification in r and python

Posted by: immahin

Date: September 09, 2018 07:45AM

0
down vote
favorite

I need an explanation about a certain matter. While data classification process in python, before training the classifier data is divided into training samples where the label class is the one we are going to identify is removed before training. For example, for classifying yeast data, where "class" is the label, I do things this way:

headers = ["name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"]
df = pd.read_csv("yeast.data", header=None, names=headers, na_values="?"
X = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
knn = NearestNeighbors(n_neighbors=6, algorithm='ball_tree', metric='euclidean')
knn.fit(X)

However, in the R language, while distance calculation the label class is also considered. For example for the same dataset, while distance calculation in R, things goes this way:

df <- read.table(file="~/yeast.txt",header=T, sep=","
names(df) <- c("name", "mcg", "gvh", "alm", "mit","erl", "pox", "vac", "nuc","class"
dist <- distances("class",df, "Euclidean"

Here, we are needed to add the label class too. Could someone explain me the reason? Am I doing something wrong?

Re: Retaining class label while classification in r and python

Posted by: Dang Nguyen

Date: September 11, 2018 03:12PM

Given a data point, if you just want to find k its nearest neighbors, then you don't need to use labels.

In case of your R code, I think the result would be the same as the Python code if you removed the labels in your training data.