The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Why algorithms do not always have the same execution time / memory usage?
Date: May 28, 2014 08:31PM

Today I received a good question in my e-mail about the SPMF data mining library or any data mining algorithms implemented in Java or other languages in general and so I will share the answer.

QUESTION: Why algorithms do not always show the same execution time and memory usage if we run them with the same parameters on the same data? For example, an algorithm may take 5000 ms and if I run it again it may take 5120 ms.



Edited 3 time(s). Last edit at 05/28/2014 08:43PM by webmasterphilfv.

Options: ReplyQuote
Re: Why algorithms do not always have the same execution time / memory usage?
Date: May 28, 2014 08:40PM

ANSWER:

It is normal that the memory and execution times are not always the same if you run the same algorithm with the same parameter on the same data.

The reason is that on your computer there are several processes running at the same time, including the operating system, which influences the execution time.

Furthermore, since the algorithms read a file from the disk and output to a disk (if you are outputting to a disk), then the execution time also depends on how your hard drive behave (whether other processes are reading/writing at the same time, whether you are writing on contiguous space on the hard drive, etc.

For the memory, it depends on how the Java Virtual Machine handle garbage collection because in Java, there is a garbage collector that is responsible of clearning the free memory. Depending on his behavior, the memory usage may change a little bit.

But in any cases, these are not really problems. The reason is that although the execution time may vary, it will never vary A LOT. It may vary by 200 ms perhaps in some case, but it will never vary by much more than that. Therefore, when comparing algorithms, you should always use low minsup values so that the execution time is long (for example 100 s). Therefore, the variation caused by the hard drive and other external factors will not be significant.

Moreover, usually when comparing algorithms, what we want to compare is the trends of how the execution times grows with respect to the parameter. What I means is that although two algorithms may have the same execution time for the same parameter, as the parameters will vary, the execution time will not increase at the same rate and this is ultimately what we want to see. Therefore, when observing a trend, the small variation in execution time will not be important.

So, you should not worry about that. Just use low minsup values so that the difference will not be significant for your result because the execution times will be long.

Finally, for more advanced programmers, here is a blog post that I wrote about the challenges of measuring memory usage in Java (because of the Garbage collector):
How to measure the memory usage of data mining algorithms in Java?

Best,



Edited 1 time(s). Last edit at 05/28/2014 08:43PM by webmasterphilfv.

Options: ReplyQuote
Re: Why algorithms do not always have the same execution time / memory usage?
Posted by: lycaphe8x
Date: May 28, 2014 10:51PM

Thanks for your sharing with us.



Edited 1 time(s). Last edit at 05/28/2014 10:51PM by lycaphe8x.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.