The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
doubt in input format
Posted by: Vin
Date: January 03, 2015 12:33AM

Have a small doubt in data mining.

suppose i have a dataset as folllows:

PrdId P1 P2 Target
1 1.2 456 H
1 1.23 400 L
1 1.0 412 L
1 0.8 424 N
1 0.6 400 L
2 4.6 520 N
2 4.7 550 H
2 4.68 550 H
2 4.0 530 N
2 4.0 500 L
3 3.3 345 H
3 3.3 340 L
3 3.6 345 H
3 3.6 340 L


now, i want to feed this to one mining algorithm(Random forest). But for current instance lets take decision tree.
If i feed this to decision tree, then the rules that will be formed will be of form

if prdid=1 and P2<410 thenTarget = L

Now this means these rules will be PrdId specific, so if a new data set series is given as input, it will now understand as the new id was not known to it during training.

Problem One : Pl. tell how can I specify group Id in models so that the rules can be generated for goups. I can find any. Accept in Association Rule mining, which cannot be applied for this kind of dataset.


So, I am calculating coeffict of variation between two column so I get a single reading for each prdid, and max frequency in target.
PrdId CV1 CV2 Target
1 0.3 0.4 L
2. 0.4 0.3 H

then giving this as input to model and generating rules,now i can ignore PrdId in put.

Problem 2: Is this way of calculating statistical values for group data and finding statistical values as input to mining algorithm valid?

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.