The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
So I have a model, how do I get other people to use it? AKA what are easy to use scoring tools?
Posted by: johan
Date: August 27, 2013 11:56AM

Hello there,
I am a novice at datamining, completed one of my first projects, but having trouble to make the results useable.

I am working in a small visitor serving company (think zoos, museums, aquariums,...) that wanted to have a model predicting attendance.

I was able to create a model using historical attendance figures, using the Decision Tree module in SPSS. Independent variables where month, day of week, etc.
Model performance on crossvalidated data is ok (73% correct predictions)
SPSS can export the model using PMML, and as a SQL script.

After creating the model, I wanted to have a simple easy to use scoring tool so laymen (aka managers) within my company could use it.

What I did was create a simple GUI in Excel, with drop down boxes where people could fill in month, day of week etc, and after hittig the "run" button the worksheet would come up with the prediction. (This is what I call a 'scoring tool', although I am not sure wether this is the correct term in Data Mining lingo.)


On the backend of the Excel sheet i have created a VBA code which containes the 'SQL rules' of the model, created by SPSS. By hitting the run button, the VBA code runs the 'SQL rules' on the input the user has given.

However, the VBA code is not the real SQL code, but code that I have manually rewritten from SQL to VBA code. It takes very long to rewrite the code to do so I was forced to use a relatively simple tree model. Otherwise it would have taken several day's to rewrite the code.

However, I have found that my model is not performing very well on real life data, and this is due to the fact that the model too simple and too generalistic.


So what I am looking for is some help or directions on a freely avaliable method to create a scoring tool based on the PMML or SQL models SPSS uses.

Has anyone had the same situation? How did you solve it?

FIY, I know a little bit of R, Weka and Rapidminer. If there would be any kind of method to create a simple 'scoring tool' using any of these, I am willing to recreate my model in one of those software packages.


PS: I guess above is a rather amateuristic way of doing things, but the little non-profit company I work for has no budget for commercially available data mining software, so I am using the tools I have, which are SPSS and Excel.

Options: ReplyQuote
Re: So I have a model, how do I get other people to use it? AKA what are easy to use scoring tools?
Date: August 27, 2013 04:11PM

What about using Microsoft Access ? It can execute some SQL statements (maybe that the syntax is not exactly the same as 'standard' SQL though) and you can also create some forms for novice users.

Philippe

Options: ReplyQuote
Re: So I have a model, how do I get other people to use it? AKA what are easy to use scoring tools?
Posted by: johan
Date: August 28, 2013 02:17PM

Thanks Philippe, you're idea is very helpfull.
I checked with our IT departement, and they prefer not to install access on my computer because it's access 2003 and want to avoid conflicts with office 2010. But I'll try to talk it out of them :-).

However, although Access might be a good option in this particular situation, it isn't very flexible. Relying on Access is limiting my options on what statistical modelling technique I can use. AFAIK the only program that outputs SQL scripts is SPSS, and it only does it with Decision Trees. Other methods in SPSS, and other software packages, rely on PMML for outputting the scoring rules (or maybe even some other formats that I'm not aware of).

Is there any other method/package that can "interpret" PMML rules so I can use this as a backend for a simple scoring tool? How do other people do this?

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.