The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Why is the rule output of the ERMiner algorithm so strange?
Date: January 26, 2022 01:43AM

Dear Professor Philippe Fournier-Viger,

I'm Xiaowei. I am using the ERMiner algorithm to identify some sequence rules. However, My sequence data always starts with 85 (Sydney), most of the rule results (x = = > y) are that 85 is y, not X. I'm confused. Could you tell me why? Thanks a lot.

Best regards,

Xiaowei


My sequence data is as follows:

@CONVERTED_FROM_TEXT
@ITEM=1=ALBURY CITY
@ITEM=2=ARMIDALE REGIONAL
@ITEM=3=Australian Capital Territory
@ITEM=4=BALLINA
@ITEM=5=BATHURST REGIONAL
@ITEM=6=BAYSIDE
@ITEM=7=BEGA VALLEY
@ITEM=8=BELLINGEN
@ITEM=9=BLACKTOWN
@ITEM=10=BLAYNEY
@ITEM=11=BLUE MOUNTAINS
@ITEM=12=BOGAN
@ITEM=13=BOURKE
@ITEM=14=BROKEN HILL
@ITEM=15=BURWOOD
@ITEM=16=BYRON
@ITEM=17=CABONNE
@ITEM=18=CAMPBELLTOWN
@ITEM=19=CANADA BAY
@ITEM=20=CANTERBURY-BANKSTOWN
@ITEM=21=CARRATHOOL
@ITEM=22=CENTRAL COAST
@ITEM=23=CENTRAL DARLING
@ITEM=24=CESSNOCK
@ITEM=25=CITY OF PARRAMATTA
@ITEM=26=CLARENCE VALLEY
@ITEM=27=COBAR
@ITEM=28=COFFS HARBOUR
@ITEM=29=COOTAMUNDRA-GUNDAGAI REGIONAL
@ITEM=30=COWRA
@ITEM=31=CUMBERLAND
@ITEM=32=DUBBO REGIONAL
@ITEM=33=EUROBODALLA
@ITEM=34=FAIRFIELD
@ITEM=35=GEORGES RIVER
@ITEM=36=GLEN INNES SEVERN
@ITEM=37=GOULBURN MULWAREE
@ITEM=38=GREATER HUME SHIRE
@ITEM=39=GUNNEDAH
@ITEM=40=HAWKESBURY
@ITEM=41=HILLTOPS
@ITEM=42=HORNSBY
@ITEM=43=HUNTERS HILL
@ITEM=44=INNER WEST
@ITEM=45=JUNEE
@ITEM=46=KEMPSEY
@ITEM=47=KIAMA
@ITEM=48=KU-RING-GAI
@ITEM=49=KYOGLE
@ITEM=50=LAKE MACQUARIE
@ITEM=51=LANE COVE
@ITEM=52=LISMORE
@ITEM=53=LITHGOW CITY
@ITEM=54=LIVERPOOL
@ITEM=55=LOCKHART
@ITEM=56=MAITLAND
@ITEM=57=MID-COAST
@ITEM=58=MID-WESTERN REGIONAL
@ITEM=59=MOSMAN
@ITEM=60=MUTAWINTJI
@ITEM=61=NAMBUCCA VALLEY
@ITEM=62=NARRABRI
@ITEM=63=NARROMINE
@ITEM=64=NEWCASTLE
@ITEM=65=NORTH SYDNEY
@ITEM=66=NORTHERN BEACHES
@ITEM=67=OBERON
@ITEM=68=ORANGE
@ITEM=69=PENRITH
@ITEM=70=PORT MACQUARIE-HASTINGS
@ITEM=71=PORT STEPHENS
@ITEM=72=QUEANBEYAN-PALERANG REGIONAL
@ITEM=73=Queensland
@ITEM=74=RANDWICK
@ITEM=75=RICHMOND VALLEY
@ITEM=76=RYDE
@ITEM=77=SHELLHARBOUR
@ITEM=78=SHOALHAVEN
@ITEM=79=SILVERTON
@ITEM=80=SINGLETON
@ITEM=81=SNOWY MONARO REGIONAL
@ITEM=82=SNOWY VALLEYS
@ITEM=83=STRATHFIELD
@ITEM=84=SUTHERLAND SHIRE
@ITEM=85=SYDNEY
@ITEM=86=TAMWORTH REGIONAL
@ITEM=87=TENTERFIELD
@ITEM=88=THE HILLS SHIRE
@ITEM=89=TWEED
@ITEM=90=Victoria
@ITEM=91=WAGGA WAGGA
@ITEM=92=WARRUMBUNGLE
@ITEM=93=WAVERLEY
@ITEM=94=WENTWORTH
@ITEM=95=WILLOUGHBY
@ITEM=96=WINGECARRIBEE
@ITEM=97=WOLLONGONG
@ITEM=98=WOOLLAHRA
@ITEM=99=YASS VALLEY
@ITEM=-1=|
16 -1 85 -1 -2
6 -1 52 -1 97 -1 -2
84 -1 85 -1 11 -1 85 -1 -2
85 -1 78 -1 65 -1 -2
85 -1 71 -1 85 -1 98 -1 -2
6 -1 85 -1 65 -1 85 -1 65 -1 85 -1 89 -1 -2
64 -1 11 -1 -2
85 -1 65 -1 -2
85 -1 25 -1 85 -1 -2
72 -1 3 -1 72 -1 3 -1 72 -1 3 -1 72 -1 3 -1 72 -1 3 -1 72 -1 3 -1 72 -1 3 -1 -2
85 -1 65 -1 85 -1 16 -1 -2
85 -1 59 -1 -2
11 -1 44 -1 65 -1 25 -1 85 -1 93 -1 85 -1 93 -1 85 -1 65 -1 -2
42 -1 6 -1 78 -1 85 -1 -2
11 -1 9 -1 85 -1 66 -1 -2
66 -1 65 -1 85 -1 65 -1 85 -1 65 -1 85 -1 65 -1 85 -1 -2
85 -1 34 -1 85 -1 6 -1 -2
85 -1 65 -1 -2
97 -1 11 -1 77 -1 97 -1 77 -1 85 -1 -2
85 -1 11 -1 44 -1 -2
44 -1 59 -1 -2
71 -1 65 -1 85 -1 65 -1 2 -1 87 -1 71 -1 -2
85 -1 11 -1 19 -1 85 -1 65 -1 85 -1 -2
64 -1 66 -1 85 -1 9 -1 11 -1 85 -1 11 -1 93 -1 11 -1 85 -1 65 -1 93 -1 85 -1 98 -1 93 -1 11 -1 -2
97 -1 71 -1 74 -1 47 -1 93 -1 97 -1 85 -1 22 -1 84 -1 66 -1 40 -1 25 -1 47 -1 84 -1 53 -1 85 -1 97 -1 84 -1 6 -1 97 -1 30 -1 41 -1 84 -1 97 -1 84 -1 47 -1 84 -1 93 -1 84 -1 97 -1 84 -1 65 -1 97 -1 84 -1 97 -1 85 -1 97 -1 85 -1 37 -1 97 -1 84 -1 85 -1 78 -1 97 -1 3 -1 85 -1 84 -1 66 -1 -2
85 -1 44 -1 -2
85 -1 6 -1 85 -1 -2
85 -1 65 -1 85 -1 65 -1 85 -1 -2
44 -1 85 -1 66 -1 85 -1 -2
89 -1 22 -1 85 -1 42 -1 66 -1 -2
98 -1 65 -1 -2
85 -1 22 -1 -2
85 -1 65 -1 85 -1 -2
89 -1 64 -1 85 -1 11 -1 85 -1 6 -1 64 -1 24 -1 64 -1 85 -1 64 -1 65 -1 64 -1 22 -1 59 -1 -2
9 -1 25 -1 97 -1 -2
93 -1 85 -1 -2
16 -1 85 -1 -2
85 -1 6 -1 85 -1 65 -1 85 -1 65 -1 85 -1 11 -1 85 -1 22 -1 71 -1 57 -1 85 -1 26 -1 46 -1 70 -1 46 -1 4 -1 26 -1 70 -1 46 -1 70 -1 26 -1 46 -1 26 -1 89 -1 26 -1 16 -1 4 -1 89 -1 26 -1 16 -1 70 -1 57 -1 -2
93 -1 85 -1 -2
3 -1 85 -1 66 -1 85 -1 -2
9 -1 24 -1 40 -1 9 -1 40 -1 9 -1 11 -1 40 -1 11 -1 40 -1 11 -1 40 -1 88 -1 11 -1 40 -1 -2
25 -1 11 -1 98 -1 93 -1 76 -1 85 -1 -2
56 -1 70 -1 -2
85 -1 74 -1 -2
84 -1 66 -1 22 -1 70 -1 66 -1 44 -1 65 -1 66 -1 -2
93 -1 85 -1 -2
85 -1 93 -1 98 -1 93 -1 85 -1 -2
66 -1 74 -1 11 -1 85 -1 44 -1 25 -1 76 -1 25 -1 85 -1 19 -1 44 -1 85 -1 65 -1 85 -1 98 -1 85 -1 98 -1 85 -1 66 -1 85 -1 65 -1 85 -1 11 -1 85 -1 93 -1 98 -1 74 -1 93 -1 74 -1 85 -1 65 -1 85 -1 59 -1 66 -1 85 -1 6 -1 66 -1 -2
3 -1 85 -1 -2
97 -1 85 -1 -2
85 -1 66 -1 -2
85 -1 65 -1 85 -1 66 -1 85 -1 98 -1 65 -1 85 -1 65 -1 85 -1 65 -1 85 -1 65 -1 59 -1 85 -1 65 -1 85 -1 65 -1 85 -1 66 -1 65 -1 85 -1 65 -1 93 -1 65 -1 93 -1 65 -1 59 -1 95 -1 65 -1 85 -1 65 -1 59 -1 65 -1 85 -1 59 -1 65 -1 59 -1 65 -1 59 -1 85 -1 66 -1 88 -1 9 -1 65 -1 85 -1 65 -1 42 -1 65 -1 42 -1 84 -1 6 -1 85 -1 84 -1 40 -1 11 -1 44 -1 85 -1 11 -1 65 -1 85 -1 65 -1 85 -1 65 -1 6 -1 -2
74 -1 85 -1 98 -1 85 -1 74 -1 85 -1 -2
6 -1 66 -1 85 -1 -2
85 -1 93 -1 85 -1 -2
85 -1 65 -1 59 -1 85 -1 66 -1 85 -1 -2
85 -1 44 -1 -2
85 -1 98 -1 85 -1 84 -1 -2
85 -1 65 -1 -2
98 -1 85 -1 93 -1 85 -1 42 -1 24 -1 95 -1 85 -1 74 -1 85 -1 74 -1 -2
85 -1 48 -1 93 -1 85 -1 98 -1 85 -1 48 -1 85 -1 11 -1 98 -1 85 -1 20 -1 85 -1 48 -1 6 -1 85 -1 48 -1 -2
85 -1 93 -1 85 -1 -2
6 -1 85 -1 11 -1 42 -1 85 -1 93 -1 6 -1 -2
85 -1 93 -1 84 -1 97 -1 -2
85 -1 66 -1 85 -1 -2
85 -1 98 -1 85 -1 59 -1 -2
59 -1 85 -1 -2
65 -1 85 -1 65 -1 85 -1 11 -1 -2
95 -1 76 -1 95 -1 85 -1 95 -1 11 -1 85 -1 42 -1 76 -1 85 -1 98 -1 66 -1 85 -1 65 -1 85 -1 31 -1 85 -1 19 -1 76 -1 95 -1 93 -1 76 -1 9 -1 -2
85 -1 93 -1 85 -1 65 -1 -2
80 -1 57 -1 16 -1 -2
85 -1 65 -1 85 -1 65 -1 -2
86 -1 85 -1 86 -1 -2
24 -1 64 -1 11 -1 -2
85 -1 11 -1 93 -1 -2
85 -1 22 -1 85 -1 44 -1 64 -1 -2
65 -1 85 -1 -2
85 -1 11 -1 3 -1 -2
85 -1 98 -1 85 -1 6 -1 -2
85 -1 65 -1 85 -1 65 -1 85 -1 65 -1 11 -1 25 -1 44 -1 11 -1 85 -1 -2
85 -1 20 -1 85 -1 93 -1 85 -1 31 -1 85 -1 54 -1 85 -1 11 -1 85 -1 95 -1 93 -1 85 -1 44 -1 3 -1 -2
65 -1 85 -1 6 -1 -2
85 -1 83 -1 25 -1 15 -1 65 -1 15 -1 85 -1 25 -1 15 -1 85 -1 93 -1 85 -1 84 -1 15 -1 84 -1 74 -1 85 -1 15 -1 72 -1 3 -1 18 -1 11 -1 85 -1 74 -1 6 -1 97 -1 84 -1 66 -1 85 -1 66 -1 85 -1 66 -1 85 -1 66 -1 35 -1 85 -1 98 -1 85 -1 37 -1 85 -1 65 -1 66 -1 85 -1 66 -1 84 -1 85 -1 74 -1 31 -1 84 -1 -2
44 -1 85 -1 65 -1 85 -1 65 -1 -2
64 -1 97 -1 64 -1 -2
24 -1 85 -1 24 -1 93 -1 24 -1 -2
85 -1 19 -1 -2
93 -1 97 -1 7 -1 19 -1 -2
85 -1 65 -1 85 -1 65 -1 85 -1 65 -1 85 -1 -2
78 -1 77 -1 85 -1 -2
85 -1 93 -1 85 -1 -2


Also, my parameter data is as follows: Minsup=0.02, Mincof=0.6,Max antecedent size=1, Max consequent size=1


The rules identified are below:

NEWCASTLE ==> BLUE MOUNTAINS
HAWKESBURY ==> NORTH SYDNEY
GOULBURN MULWAREE ==> NORTHERN BEACHES
QUEANBEYAN-PALERANG REGIONAL ==> Australian Capital Territory
RYDE ==> NORTH SYDNEY
RYDE ==> NORTHERN BEACHES
RYDE ==> CANADA BAY
RYDE ==> BLUE MOUNTAINS
GOULBURN MULWAREE ==> SUTHERLAND SHIRE
NORTH SYDNEY ==> SYDNEY
NORTHERN BEACHES ==> SYDNEY
BAYSIDE ==> SYDNEY
PORT STEPHENS ==> SYDNEY
RANDWICK ==> SYDNEY
BLUE MOUNTAINS ==> SYDNEY
RYDE ==> SYDNEY
SHELLHARBOUR ==> SYDNEY
SHOALHAVEN ==> SYDNEY
CANTERBURY-BANKSTOWN ==> SYDNEY
CESSNOCK ==> SYDNEY
CITY OF PARRAMATTA ==> SYDNEY
CUMBERLAND ==> SYDNEY
GOULBURN MULWAREE ==> SYDNEY
HAWKESBURY ==> SYDNEY
HORNSBY ==> SYDNEY
THE HILLS SHIRE ==> HAWKESBURY
THE HILLS SHIRE ==> BLUE MOUNTAINS
RYDE ==> WAVERLEY
WAVERLEY ==> SYDNEY
CITY OF PARRAMATTA ==> WAVERLEY
CUMBERLAND ==> WAVERLEY
WILLOUGHBY ==> SYDNEY
CUMBERLAND ==> WILLOUGHBY
RYDE ==> WOOLLAHRA
WOOLLAHRA ==> SYDNEY
HAWKESBURY ==> BAYSIDE
HAWKESBURY ==> BLUE MOUNTAINS
MID-COAST ==> BYRON

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 03, 2022 07:07AM

Good evening,

I am sorry to answer late. I have been busy with the Chinese New Year, and sometimes I forget to look at the forum. If I forget, you can also send me an email directly to ask me to check it and I will answer faster.

I had a look at your data, and I see that many sequences contain 85.

The main reason why 85 would appear often in the consequent (right side of a rule) rather than the antecedent (left side of a rule) is that 85 appears in almost all sequences and the confidence is a function that is not symetric.

Let me explain

The confidence of a rule X--> Y is: the number of times that X is followed by Y divided by the number of time that we see X.

Let say that we take a rule like this as example:
SYDNEY==> WOOLLAHRA
85 --> 98

The confidence of that rule will be very low because it is divided by the number of times that we have 85, and 85 appears in almost all sequences. This is why this rule is not output.

On the other hand, this rule:

WOOLLAHRA ==> SYDNEY
98 --> 85

might have a high confidence because 98 does not appear very often, and usually when it appears it is followed by 85.

The other rule has a low confidence because 85 appears almost in all sequences but is not so often followed bby 98.

Hope that this explain and help to understand more about the confidence ;-)

Best regards,

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 03, 2022 07:08AM

By the way, nice to see all these cities name from Canada ;-) This reminds me about my home country.

Best regards
Philippe

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 16, 2022 07:34PM

Dear Professor Philippe Fournier-Viger,


Happy Chinese New Year! Thank you for your reply, which helped me a lot.

I also want to ask you another question about the sample size. What is the minimum number of observations(sequences) required for Sequential rule mining? Is there a standard or index to define the sample size? in addition to support and confidence, are there other indicators to measure the accuracy or efficiency?

one reviewer comments our data "I think it would be valuable for you to elaborate more precisely how you addressed the sampling bias issue for this study and what methods others researchers could adopt to ensure that a sample of approximately 200 users is representative of the population of interest". Do you have any good suggestions to response to this comment?

Best regards,

Xiaowei

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 16, 2022 10:36PM

Hi

>I also want to ask you another question about the sample size. What is the minimum number of observations(sequences) required for Sequential rule mining? Is there a standard or index to define the sample size? in addition to support and confidence, are there other indicators to measure the accuracy or efficiency?

> one reviewer comments our data "I think it would be valuable for you to elaborate more precisely how you addressed the sampling bias issue for this study and what methods others researchers could adopt to ensure that a sample of approximately 200 users is representative of the population of interest". Do you have any good suggestions to response to this comment?


I am not sure how we could calculate the sample size of bias issue.

There are some pattern mining algorithms that are designed to find statistically significant patterns like OPUS-Miner and Skopus. These algorithms do not find sequential rules. I just talk about this as example. These algorithms will apply some statistical tests for each pattern like the Fisher exact test to determine if a pattern is significant. But not only this, the algorithms will also do correction to handle multiple testing (because the algorithms need to do multiple tests as they need to evaluate multiple patterns with statistical test). So doing this is not something trivial to do. An algorithm must be specifically designed for this, if we want to use statistical testing.

The ERMiner algorithm is not designed to find something statistically significant under a very strict definition like the above algorithms. Adding statistical tests to ERMiner could be a whole research project.

If you want you could say something like that. As for the sample size, I am not sure if there is an easy answer. Maybe if you check your statistics books? Or you might try to ask about this on https://stats.stackexchange.com/ a good webstie to ask questions about statistics

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 17, 2022 01:29AM

Thanks a lot, Professor Philippe Fournier-Viger. I mean What is the minimum number of observations(sequences) required for Sequential rule mining? In general, rules can be identified as long as there is a sequence, that is, the minimum number of sequences is 1. However, I want to know what is the minimum number of sequences commonly used?

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 17, 2022 03:05AM

I see. Yes, as long as you have at least one sequence, you can find rules. That is correct.

Yes, in some papers, we dont use a lot of sequences.

For example, during my PhD thesis, over 10 years ago, I was using sequential pattern mining in e-learning, where we only had about 30 sequences ( http://www.philippe-fournier-viger.com/TLT-2012_FournierViger_preprint.pdf ). This was not statistically significant probably, but it still allowed to help students who were using the e-learning system.

I think if you do a quick search for papers on sequential rule mining that have cited the ERMiner and RuleGrowth algorithm, you may see quickly how many sequences they have used. Maybe you could then tell the reviewer that other studies have used that many sequences like X, Y and Z

Here are some papers about applications of sequential rule mining for example:


Quality control

Bogon, T., Timm, I. J., Lattner, A. D., Paraskevopoulos, D., Jessen, U., Schmitz,
M., Wenzel, S., Spieckermann, S.: Towards Assisted Input and Output Data Analysis
in Manufacturing Simulation: The EDASIM Approach. In: Proc. 2012 Winter
Simulation Conference, pp. 257–269 (2012)

Web page prefetching

Fournier-Viger, P. Gueniche, T., Tseng, V.S.: Using Partially-Ordered Sequential
Rules to Generate More Accurate Sequence Prediction. Proc. 8th International Conference
on Advanced Data Mining and Applications, pp. 431-442, Springer (2012)

Anti-pattern detection in service based
systems,


Nayrolles, M., Moha, N., Valtchev, P.: Improving SOA antipatterns detection in
Service Based Systems by mining execution traces. In: Proc. 20th IEEE Working
Conference on Reverse Engineering, pp. 321-330 (2013)



Recommendation

Jannach, Dietmar, and Simon Fischer. “Recommendation-based modeling support for data mining processes.” Proceedings of the 8th ACM Conference on Recommender systems. ACM, 2014.

Interestingly, the above work found that sequential rules found by CMRules provided better results than other compared patterns found using FPGrowth and other algorithms.

Jannach, D., Jugovac, M., & Lerche, L. (2015, March). Adaptive Recommendation-based Modeling Support for Data Analysis Workflows. In Proceedings of the 20th International Conference on Intelligent User Interfaces (pp. 252-262). ACM.

Restaurant recommendation

Han, M., Wang, Z., Yuan, J.: Mining Constraint Based Sequential Patterns and
Rules on Restaurant Recommendation System. Journal of Computational Information
Systems 9(10), 3901-3908 (2013)

Customer behavior analysis

Noughabi, Elham Akhond Zadeh, Amir Albadvi, and Behrouz Homayoun Far. “How Can We Explore Patterns of Customer Segments’ Structural Changes? A Sequential Rule Mining Approach.” Information Reuse and Integration (IRI), 2015 IEEE International Conference on. IEEE, 2015.



E-learning

Fournier-Viger, P., Faghihi, U., Nkambou, R., Mephu Nguifo, E.: CMRules: Mining
Sequential Rules Common to Several Sequences. Knowledge-based Systems, Elsevier,
25(1): 63-76 (2012)

Toussaint, Ben-Manson, and Vanda Luengo. “Mining surgery phase-related sequential rules from vertebroplasty simulations traces.” Artificial Intelligence in Medicine. Springer International Publishing, 2015. 35-46.

Faghihi, Usef, Philippe Fournier-Viger, and Roger Nkambou. “CELTS: A Cognitive Tutoring Agent with Human-Like Learning Capabilities and Emotions.” Intelligent and Adaptive Educational-Learning Systems. Springer Berlin Heidelberg, 2013. 339-365.



Embedded systems

Leneve, O., Berges, M., Noh, H. Y.: Exploring Sequential and Association Rule
Mining for Pattern-based Energy Demand Characterization. In: Proc. 5th ACM
Workshop on Embedded Systems For Energy-Efficient Buildings. ACM, pp. 1–2
(2013)

Alarm sequence analysis

Celebi, O.F., Zeydan, E., Ari, I., Ileri, O., Ergut, S.: Alarm Sequence Rule Mining
Extended With A Time Confidence Parameter. In: Proc. 14th Industrial Conference
on Data Mining (2014)

Ileri, Omer, and Salih Ergüt. “Alarm Sequence Rule Mining Extended With A Time Confidence Parameter.” (2014).

Manufacturing simulation

Kamsu-Foguem, B., Rigal, F., Mauget, F.: Mining association rules for the quality
improvement of the production process. Expert Systems and Applications 40(4),
1034-1045 (2012)


There are certianly others...



Edited 3 time(s). Last edit at 02/17/2022 03:08AM by webmasterphilfv.

Options: ReplyQuote
Re: Why is the rule output of the ERMiner algorithm so strange?
Date: February 17, 2022 04:42AM

Thanks a lot, Professor Philippe Fournier-Viger. I will check these papers.


Best regards,

Xiaowei

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.