how to understand memory scalability?
Date: January 17, 2022 11:30PM

Dear Professor Philippe Fournier-Viger,

I'm Xiaowei, a lecture at Zhejiang Gongshang University. Recently, I saw the expression "TRuleGrowth has better memory scalability" in your articles named “Mining Sequential Rules Common to Several Sequences with the Window Size Constraint”. However, I didn't understang the memory scalability. Could you define and explain this term for me?

Thanks a lot.

Best wishes,
Xiaowei

Re: how to understand memory scalability?
Date: January 17, 2022 11:48PM

Another question is, what is the difference between memory scalability and memory consumption/usage?

Re: how to understand memory scalability?
Date: January 18, 2022 01:13AM

Good afternoon Dr. Xiaowei,

Thanks for your interest in the paper. And happy to receive a messsage from someone else in China! I have been to Zhejiang and it is nice place ;-)

This is a good question and I understand that terms can be confusing.

I will explain these terms and how I use them to make this clearer.

- Memory usage or memory consumption mean the amount of RAM memory that is used by an algorithm at a specific time. For example, if I run an algorithm, I could check now (at 5:50 PM) and see that the algorithm is currently using 200 MB.

- Maximum memory usage or peak memory usage: Since the memory usage of an algorithm can vary quite a lot over time, we are often interested by the maximum memory usage. That means that I will run an algorithm and perform many measurements of the memory usage over time, and then I will just keep the maximum value. For example, I can run an algorithm from 5:50PM to 5:52 PM. Lets say that at 5:50 PM it uses 200 MB, then at 5:51 PM it uses 350 MB and then at 5:52 PM, it uses 100 MB. Then the maximum memory usage is 350 MB.

In general, for the papers where the algorithms are implemented in Java, we use the maximum memory usage to measure memory because in Java, memory measurement are quite inaccurate. The reason is that Java uses a garbage collector (GC) to free memory and we never know when the GC will clean the memory. By calculating the maximum memory usage, we can get some results that are quite stable.

- Memory scalability : This refers to how much the (maximum) memory usage will increase when we increase the amount of data that is given as input to analgorithm. For example, when we increase the amount of data, the memory could increase a little, increase linearly, increase exponentially, etc. So to evaluate the memory scalability, we will vary the size of the dataset that is given to an algorithm and check how much memory is used. Then we can draw a chart or make a table with these results. If we find the memory usage increases exponentially, this is very bad.. :-) Ideally, we want to find that the memory usage increases linearly with the size of the data.

So to go back to that paper, when it is said that algorithm X has a better memory scalability than another algorithm Y, it means that if we give bigger and bigger datasets to X the memory usage will increase more slowly than if we give bigger and bigger datasets to Y.

To make this more clear, you can think about the memory scalability as the space complexity of an algorithm. But when we talk about complexity we generaly try to analyze the complexity of an algorithm formally.... When we talk about scalability we often talk from the perspective of what we observed in some experiments!

Related to that, if you are using Java, you may be interested to read my blog post where I explain how to measure accurately the memory usage in Java:
How to measure the memory usage of data mining algorithms in Java?

Also, I should also say that when we evaluate scalability we can also evaluate scalability with respect to different aspect of the data. For example, lets say that I make an algorithm to analyze graph data. I could check the scalability when we increase the number of edges, or I could check the scalability when we increase the number of labels per nodes etc. So basically, we can vary different aspects of the data to evaluate the scalability.

Also, we can evaluate the scalability not just in terms of memory but also in terms of runtime ;-)

Hope this is clear!

Best regards,

Philippe

Edited 1 time(s). Last edit at 01/18/2022 01:14AM by webmasterphilfv.

Re: how to understand memory scalability?
Date: January 18, 2022 02:47AM

In fact, I am using the algorithms you proposed to investigate tourist movement, and have received some comments from reviewers, but I don't know how to respond. Could you give me some advice?

Our sequence database was input into the TopSeqRules, TNS, TruleGrowth, and ERMiner algorithm respectively. The different parameters of these four algorithms are set. the TopSeqRules, TNS, TruleGrowth, and ERMiner algorithm generate 41, 35, 29, and 41 sequence rules, respectively. Twenty-two sequential rules are common to them and identified.

Comment 1: When you describe the choice of thresholds on P14, it seems rather opaque and that only after repeated trial and error do you arrive at "meaningful rules" (L54). Could you please address this concern that the results are based on arbitrary parameters.

Comment 2:Are you proposing a method whereby only the common 22 sequences are considered meaningful, and that the unique sequences should be disregarded?. Related to this, I think you need to discuss the validity of your findings. You show that there are 22 sequences that all 4 programs found, but this is evidence of reliability. Please demenstrate the vallidity of your findings.

Comment3: I would also suggest that authors discuss the practical significance of the results achieved. If I interpret your results correctly (Table 5) the "strongest" sequence r1 had a frequency of 3.8% and many other rules having a frequency of only .9%. Could you elaborate on if a confidence level of .667 would be considered statistically significant? Are these low values still a meaningful basis for making strategic decisions?

Best regards,

Xiaowei

Re: how to understand memory scalability?
Date: January 18, 2022 05:56AM

Dear Dr. Xiaowei,

1) The first comment is about how to set the parameters.

Generally, how to set the minsup parameter is dataset dependent. This is because the number of rules depends of characteristics of datasets such as the length of sequences, the number of distinct items, how similar the sequences are, etc. On some dataset, minsup = 0.1 may generate millions of rules while on another dataset minsup = 0.01 may generate no rules at all..
So setting minsup is generally done by trial and error and is more used to filter the infrequent rules that may be the result of noise.

The minconf parameter is the most important parameter as it represents the conditional probability that some events will follow some other events. How to set the minconf parameter ?
I think that a possible explanation that would be convincing is to draw a chart like Fig. 3 in this paper:
http://www.philippe-fournier-viger.com/2020_AIOPS_Alarm_correlation_rules.pdf
In that paper, we want to find a good value for the correlation of rules but we also dont know what is a good minimum correlation treshold. So what we did is that we decreased the minimum correlation parameter and checked how many rules are found for different values. Then we draw a chart （Fig。3）. Then we use q heuristic method called the "elbow method" often used in data mining. It consists of looking at the chart to find the "elbow", which is at 1.35. What is an elbow? It means the point where the curve starts increasing very quickly. So we use this as argument to justify the number of rules that we found. You can see the text that we wrote about this also in the paper:

In this study, we varied the minimum correlation threshold and
noted the number of rules found for each value to draw a chart representing the
frequency distribution of rules w.r.t correlation (shown in Fig. ). We then observed that a large increase in the number of rules occurs for correlation values below 0.135. Assuming that those could be spurious rules, we set the minimum correlation to 0.135. For this parameter value, about 500 alarm correlation rules were discovered including 113 found by AABD.

So I think you could try to draw a chart like this for the confidence to analyze how the confidence of rules vary when you decrease the minconf threshold and try to find the elbow. The elbow will mean the point where you decrease minconf and the number of rules increase a lot! I am not sure whether it will give good result. But you may try this as an idea and see what is the result.

Besides this approach, I also sometimes ask at domain experts to validate the rules that I find. For example, in that paper, we had rules found in computer network data. Then, to verify if these rules make sense, we asked a telecommunication network expert to look at the rules and validate which rules are really meaningful and which one are not. I also did that for a paper where we found rules in e-learning data. In that case we also asked some expert to check our rules to validate them. You could also ask someone to verify your rules and then say that a domain expert has verified them and how many percent of these rules are good and how many percent are not.

(...)

I will continue my answer in another message below.

Re: how to understand memory scalability?
Date: January 18, 2022 06:08AM

(...)

2)

Comment 2:Are you proposing a method whereby only the common 22 sequences are considered meaningful, and that the unique sequences should be disregarded?. Related to this, I think you need to discuss the validity of your findings. You show that there are 22 sequences that all 4 programs found, but this is evidence of reliability. Please demenstrate the vallidity of your finding

I see that you use TNS, TRuleGrowth ERMiner and TopSeqRules.
There algorithms do not have exactly the same goal so yes, maybe finding the common rules is maybe not the best way to analyze the results.

I think you can maybe explain and analyze like this:
- ERMiner is the basic algorithm to find sequential rules with support and confidence.
- TRuleGrowth is an algorithm to find sequential rules. It will produce the same result as ERMiner but it adds some time constraints. using these time constraints we can reduce the number of rules by eliminating rules that are for example too long in terms of time. So the question that would be interesting to analyze is how many rules are filtered by the time? Maybe you could vary the time constraint and draw some chart like the number of rules that you get for different window size. This would show the distribution and would be interesting.

- For TopSeqRules... This algorithm is not very special. If you set the parameter k properly, it will give the same result as ERMiner. So there is not much to say about that. The advantages of TopSeqRules is just that it is more convenient usually for users to set the parameter k than to set minconf.

- For TNS, the goal is to keep only the non-redundant rules... So you can think about this as if you modify ERMiner or TopSeqRules to filter some rules that are redundant. So a key question that you could analyze is how many of the rules found by ERMiner are actually redundant? how many percent? You could analyze that.... But a drawback of TNS is that it is an approximate algorithm I think. Besides that, another drawback is that it does not have the time constraints of TRuleGrowth. So you could say that there is some trade-off..

So overall, I think I would try to explain something like that to have more discussion about the results and may reviewers would like that. ;-)

As for doing the intersection of results, you could try to justify that by saying that there is no algorithm that combines time constraints and the constraint of non redundancy. So by doing the intersection it would be as if you would apply the two constraints together.

Edited 1 time(s). Last edit at 01/18/2022 06:09AM by webmasterphilfv.

Re: how to understand memory scalability?
Date: January 18, 2022 06:15AM

（。。。）

Comment3: I would also suggest that authors discuss the practical significance of the results achieved. If I interpret your results correctly (Table 5) the "strongest" sequence r1 had a frequency of 3.8% and many other rules having a frequency of only .9%. Could you elaborate on if a confidence level of .667 would be considered statistically significant? Are these low values still a meaningful basis for making strategic decisions?

For this question, I think it is a little tricky. There is indeed no statistical test in these algorithms. There does exist some pattern mining algorithm that can find statistically significant pattern such as Skopus for sequential patterns. You can find an implementation in SPMF. So a possibility could be to also apply that algorithm on your data and include this in your results. Maybe the reviewer would be satisfied with that.

Best regards,

Philippe

Or another possibility is to do some additional statistical test on the rules that you have found. I am not sure which statistical test to use. I am not a statistician to be honest ;-) But maybe some test could be appropriate to verify the rules? It is a good question...

Re: how to understand memory scalability?
Date: January 18, 2022 10:37PM

Hi Professor Philippe Fournier-Viger, thank you for your time and effort addressing those comments. Your insights can contribute our revision. Thank you again.

Best regards,

Xiaowei

Re: how to understand memory scalability?
Date: January 19, 2022 05:31AM

Hi Professor Philippe Fournier-Viger,

According to your suggestion, I use Skopus algorithm, but I get sequence patterns, not sequence rules. Sequence patterns are not the same thing as sequence rules, are they? Therefore, to get sequence rules, it seems that this algorithm doesn't work. Do you have any other algorithm suggestions?

Best regards,

Xiaowei