So I have a model, how do I get other people to use it? AKA what are easy to use scoring tools?
Posted by:
johan
Date: August 27, 2013 11:56AM
Hello there,
I am a novice at datamining, completed one of my first projects, but having trouble to make the results useable.
I am working in a small visitor serving company (think zoos, museums, aquariums,...) that wanted to have a model predicting attendance.
I was able to create a model using historical attendance figures, using the Decision Tree module in SPSS. Independent variables where month, day of week, etc.
Model performance on crossvalidated data is ok (73% correct predictions)
SPSS can export the model using PMML, and as a SQL script.
After creating the model, I wanted to have a simple easy to use scoring tool so laymen (aka managers) within my company could use it.
What I did was create a simple GUI in Excel, with drop down boxes where people could fill in month, day of week etc, and after hittig the "run" button the worksheet would come up with the prediction. (This is what I call a 'scoring tool', although I am not sure wether this is the correct term in Data Mining lingo.)
On the backend of the Excel sheet i have created a VBA code which containes the 'SQL rules' of the model, created by SPSS. By hitting the run button, the VBA code runs the 'SQL rules' on the input the user has given.
However, the VBA code is not the real SQL code, but code that I have manually rewritten from SQL to VBA code. It takes very long to rewrite the code to do so I was forced to use a relatively simple tree model. Otherwise it would have taken several day's to rewrite the code.
However, I have found that my model is not performing very well on real life data, and this is due to the fact that the model too simple and too generalistic.
So what I am looking for is some help or directions on a freely avaliable method to create a scoring tool based on the PMML or SQL models SPSS uses.
Has anyone had the same situation? How did you solve it?
FIY, I know a little bit of R, Weka and Rapidminer. If there would be any kind of method to create a simple 'scoring tool' using any of these, I am willing to recreate my model in one of those software packages.
PS: I guess above is a rather amateuristic way of doing things, but the little non-profit company I work for has no budget for commercially available data mining software, so I am using the tools I have, which are SPSS and Excel.