The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
K mean clustering and classifying instances without class labels
Posted by: Sam
Date: July 17, 2014 08:21AM

I'm trying to classify my set of data into two classes (introvert / extrovert). I was thinking of using a decision tree at first, but I don't have any potential known results in order to create my decision tree model. Thus I decided to use a k-means clustering algorithm with k = 2.

Since the clustering algorithm accepts only numeric values, can I use the decision tree algorithm to transform some type of values I have into numeric ones at first (based on some rules I define within the tree) before I start clustering?

Let's suppose at the end of the algorithm I get my 2 clusters: cluster 1 and cluster 2. How can I classify these two clusters based on my 2 classes? Am I supposed to use supervised or semi-supervised clustering? (I don't know how semi- and supervised clustering work).

Is there any other simple and efficient classification technique that can satisfy my needs?

P.S. I'm new to this domain and all your advice and remarks are appreciated.



Edited 1 time(s). Last edit at 07/17/2014 04:45PM by webmasterphilfv.

Options: ReplyQuote
Re: K mean clustering
Date: July 17, 2014 09:09AM

Clustering is used to build some classes automatically with individuals that are similar. I assume that you have have a training set and that the information about who is introvert and who is extrovert is known.

If it is known, then you don't need clustering, since you know already who is introvert and who is not.

If it is unknown, then it will be very hard to predict anything since you don't have information about what is an introvert/extrovert person.

Now, to make the prediction, you could use a decision tree or any other classifier, or a k nearest neighboor approach. If you use a decision tree, you just need to feed your training data to the decision tree learning algorithm. When you do that, you will indicate that your target attribute is "introvert/extrovert". The decision tree learning algorithm will build automatically the tree with the other attributes.

Options: ReplyQuote
Re: K mean clustering
Posted by: Sam
Date: July 17, 2014 11:38AM

Thank you for your answer. I dont know who is introvert or extrovert in my training set, however i know a set of rules collected from a deep study regarding introvert/extrovert which can help me in classification.

My question is how can I build a decision tree model that will classify my training set based on the rules i have. Is there any steps should i follow or something? (Prioritize variables...)

Thanks in advance for all your time and help.

Options: ReplyQuote
Re: K mean clustering
Date: July 17, 2014 04:45PM

A typical decision tree learning algorithm need the class label to create the decision tree. If you don't have the class label it does not make much sense to build a decision tree. If you understand what is a decision tree, you will know that each level in a decision tree is an attribute that discriminate according to the class labels. So if you don't have class labels, it does not make sense.

So there are the options that I see:

(1) If your training set is not too large, you could manually label each instances of your training set using the rules. The advantage of doing it by hand is that it may be more accurate. Then you would have the class labels and you could train your decision tree so that it could classify new instances.

(2) Another idea is to forget about the decision tree and create your own program to classify instances according to your rule. You could create a kind of rule based system that check which rules applies and give them more or less importance. This program could be used as it is. (2b) Or if you really want to build a decision tree, you could use the label given by your program to train the decision tree (but i don't like too much this idea).

In my opinion, I think that option (1) is the best, and option (2) could work. But I think that option (2b) is not that good if you want to calculate the accuracy of the decision tree predictions it may not make much sense because you will compare the decision tree results with the labels of another computer program.



Edited 2 time(s). Last edit at 07/17/2014 04:48PM by webmasterphilfv.

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Posted by: Sam
Date: July 17, 2014 05:02PM

I understand. I was googling this subject and i have found some papers talking about steps for creating decision trees from a set of rules for example in this paper:
"Converting Declarative Rules into Decision Trees", but honestly i couldnt fully understand the steps. If you dont mind of course is it possible to have a look at it and give me your feedbacks. Thank you for all your time and guidance

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Date: July 18, 2014 02:42AM

I had a quick look and I think that their method needs class labels since they seem to use a rule induction algorithm to learn the rules automatically. If the algorithm don't know what are the classes, how could the rule induction algorithm know what criteria to use (introvert/extrovert instead of something else)? I think that it does not solve your problem.

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Posted by: Sam
Date: July 17, 2014 05:19PM

By the way, if i use k nearest neighboor approach does it solve my problem or it just works like decision trees where i need an already known data set?. thanks

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Date: July 18, 2014 02:46AM

In simple words, if you want to classify an instance, k-nn will look for the k most similar instances to your instance and choose the most popular class. But if you don't have class label, k-nn cannot be applied.



Edited 1 time(s). Last edit at 07/18/2014 02:47AM by webmasterphilfv.

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Posted by: Sam
Date: July 18, 2014 11:46AM

I was thinking what if i make the rules i have as a data set examples with known classes then create a decision tree from it, then work on my list of unkown data sets. will it work?. If not is there are more simple supervised algorithm for classification?.My problem is i'm obliged to develop a classification systen and to be honest it will be better to work with a known technique ( not just a system i have developed based on my rules like you mentioned in option 2).

Thank you very very much Sir for all your time and guidance. i really appreciate it and i'm sorry if i ask too much, as i said it's not my domain but it happens that i need this step in my work. Best regards

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Date: July 18, 2014 05:55PM

Normally, a decision tree learning algorithm takes a set of instances as input who are described by a set of attributes and one attribute has to be selected as the target attribute that you want to predict (extrovert/introvert).

If you would give the rule as input to the decision tree learning algorithm, then your rules would need to be described using the same attributes as the instances that you want to classify, which I don't think you can do, if they are rules. Thus, I don't think it can work.

Decision tree is one of the simplest way to classify data. Another simple approach is the Naive Bayes classifier or k-nn. I forgot about the details of Naives Bayes, but it is very simple and can be implemented easily if i remember well. But in any case, you will face the problem that you need class labels.



Edited 1 time(s). Last edit at 07/18/2014 05:57PM by webmasterphilfv.

Options: ReplyQuote
Re: K mean clustering and classifying instances without class labels
Posted by: Sam
Date: July 18, 2014 06:19PM

I have just represented my set of rules as a training set with known classes just like the test data set . I guess we have just overcome one of the problems now. How can we use Naive Bayes in classification ( again i googled it but didnt fully understand the steps). Thank you very very very much Sir for all your time, support and guidance. I'm sorry if i did ask too much

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.