Re: Categorization Project
Date: April 12, 2014 04:25AM
If you want to categorize records, then if you have some training data (data that is already categorized), then you can train a classifier such as neural networks, decision tree, etc. Then you can use the trained classifier to classify the non categorized records.
If you don't have training data, then you may apply a clustering algorithm to automatically create some new categories from your data.
If you don't have training data and you want to categorize using your own category in your category file, then you would need to have a description of the categories and you could try to match each record with the most similar category. You would need to define a similarity measure to indicate.
For step 2,3, it is a simple programming problem to count data record and count which one are not classified.
For step 4, you would need to find some tools do to visualizations or write your own.