The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
Categorization Project
Posted by: Hossam
Date: April 12, 2014 03:44AM

I have two files (data file uncategorized - Categories list file).

1- I want to categorize the data file, each record related to more than category from the categories file.
2- Then I want count how many data record related to each category of the categories list file.
3- I want to determine the categories that have no data records related to it (Gap analysis)
4- I want to visualize if possible.

I need a help of the name of techniques to be used to perform this project, and also if there a dataset and examples closed to my work.

Many Thanks.

Options: ReplyQuote
Re: Categorization Project
Date: April 12, 2014 04:25AM

If you want to categorize records, then if you have some training data (data that is already categorized), then you can train a classifier such as neural networks, decision tree, etc. Then you can use the trained classifier to classify the non categorized records.

If you don't have training data, then you may apply a clustering algorithm to automatically create some new categories from your data.

If you don't have training data and you want to categorize using your own category in your category file, then you would need to have a description of the categories and you could try to match each record with the most similar category. You would need to define a similarity measure to indicate.

For step 2,3, it is a simple programming problem to count data record and count which one are not classified.

For step 4, you would need to find some tools do to visualizations or write your own.

Options: ReplyQuote
Re: Categorization Project
Posted by: Hossam
Date: April 12, 2014 05:59AM

Thank you for you advices, I can manually categorize some training data but cannot cover all possible categories.

I prefer to categorize using the categories file based on the description of data and the description of categories, but the record has more than one category. I think it is a text mining?

could you please give me guidelines to perform that in a simple steps, and with pervious experiment if available.

Many thanks for your cooperation

Options: ReplyQuote
Re: Categorization Project
Date: April 12, 2014 09:52AM

If your record are text and the description of your categories are text, then yes, you could use some text mining alorithms to extract some information automatically from your records to compare them with your category and perform the categorization (classification).

No, I cannot give detailled steps since I don't work on this topic and I'm already very busy with my own research projects.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.