Show all posts by user

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto: Forum List•Message List•New Topic•Log In

Pages: 12 3 4 5 ...Last Next

Current Page: 1 of 67

Results 1 - 30 of 2010

2 years ago

webmasterphilfv

!!!! !!!!! IMPORTANT! THE FORUM WILL MOVE TO A NEW ADDRESS !!!! !!!!! eye popping smiley

Good morning all, I am announcing today that the Data Mining Forum will move to a new address: https://forum2.philippe-fournier-viger.com This is because the original forum was built more than 12 years ago based on Phorum, an old PhP script that is no longer supported since about 10 years. The website is thus becoming harder and harder to maintain as it is incompatible with new PhP versi
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

2. Re: Up to date Data Mining Conference List -2022

Good evening, I have indeed not updated the list in a long time! Thanks for reminding me.. I will try to make a new list soon. Thanks for your feedback. Philippe
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

3. Re: Real-valued utilities

Good evening, Yes, the most simple solution is to multiply values by 100 because most algorithms in SPMF takes integers as utility values. However, as you noticed, some algorithms have been adapted to accept floats such as: FHM(float) I think that it is maybe the only one. But it is possible that there are others. If you need a specific algorithm to work with float and it is very impo
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

4. Re: UCI repository datasets transformation

Good evening, Here is some piece of Java code that I use for converting a CSV file into the another format The input is like this: 1,2,3,4 5,6,7,8 5,6,7 1,2,3 The output is like this: 1 2 3 4 5 6 7 8 5 6 7 1 2 3 The Java code: import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileInputStream; import java.io
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

5. Re: How to download other version smpf?

Good evening, Thanks for using SPMF. I did not keep all the versions. But here, there are some very old versions of SPMF that you can download here: FOR THE SOURCE CODE VERSION: http://philippe-fournier-viger.com/spmf/oldversions/spmf2.36.zip http://philippe-fournier-viger.com/spmf/oldversions/spmfv_2.34.zip http://philippe-fournier-viger.com/spmf/oldversions/spmf2.26.zip
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

6. Re: Association rules and itemsets

Hi, That would be long to explain. You can watch some youtube videos that I recorded recently, that explain all these concepts and how itemsets and association rules are related to each other. Frequent itemsets https://www.youtube.com/watch?v=Ken5_GTZySM Association rules https://www.youtube.com/watch?v=idQEwXWcQfM Best Philippe
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

7. How to set the minutil threshold?

I received this question today: have a doubt about high utility itemset mining. How we can define the minimum utility? I have read many papers and they said that the minimum utility value is based on the users' preferences. Actually, I do trial and error to choose the minimum utility. But, is there any logic that can be used to define the good/optimal/correct minimum utility? My answer: I
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

8. Re: UCI repository datasets transformation

Hi, Glad the website is useful and thanks for posting On UCI there are many datasets. Many of them have different formats. So the best way to convert datasets depends on the format. To convert a dataset for FIM, you need to think what will be the transactions and what will be the items. For example, if in a dataset the data is numerical, then you may have to discretize it to obtain items.
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

9. CFP : 5th Int. Conf. on Smart Technologies in Data Science and Communication (SMART DSC-2022) (India/Online)

Fifth International Conference on Smart Technologies in Data Science and Communication (SMART DSC-2022) Hosted by: K.L. Deemed to be University, KLEF Dates: July 21-22, 2022 Venue: Vijayawada/Guntur, Country: India Website: http://smartdsc2022.azurewebsites.net/ https://ocs.springer.com/misc/home/SMART-DSC2022 Special Theme:: “Data Science: Unseen patterns and Decisio
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

10. Re: CFP DSSBA @ IEEE DSAA (Data Science for Social and Behavioral Analytics)

Hope to see your papers :-)
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

11. CFP DSSBA @ IEEE DSAA (Data Science for Social and Behavioral Analytics)

Hi all, I am organizing a new special session at IEEE DSAA 2022 called Data Science for Social and Behavioral Analytics. All accepted papers will be published in the regular proceedings of IEEE DSAA 2022. Please see this website for more details: http://philippe-fournier-viger.com/DSSBA_2022/index.html
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

12. Re: PFPM algorithm for periodic itemset mining

For PFPM, you can find the code in SPMF. For flight data, I think you need to collect the data by yourself, or maybe check some repository like Kaggle. Flight data like date, time, arrival and departure should be public.
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

13. Re: How do you check memory for multiple algorithms?

Hi, From a Java perspective, the programs run in a Java virtual machine that should have more or less the same behavior on different operating systems. ut it is not guaranteed. Regards,
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

14. Re: why are there two different trees for the same data set

Just by looking at these numbers, I dont know. I would recommend to look up at the definitions. Maybe someone else can answer
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

15. Re: why are there two different trees for the same data set

Hi, There are different algorithms to build decision trees. A typical algorithm for building decision trees will for example build a tree from the top to the bottom, node by node. To decide which attribute to use in a node, the algorithm will use some criteria to compare the attributes. There exists various criteria like the GINI, information gain, etc. For example, the ID3 algorithm will
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

16. Re: Why is the rule output of the ERMiner algorithm so strange?

I see. Yes, as long as you have at least one sequence, you can find rules. That is correct. Yes, in some papers, we dont use a lot of sequences. For example, during my PhD thesis, over 10 years ago, I was using sequential pattern mining in e-learning, where we only had about 30 sequences ( http://www.philippe-fournier-viger.com/TLT-2012_FournierViger_preprint.pdf ). This was not statistical
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

17. Re: Why is the rule output of the ERMiner algorithm so strange?

Hi >I also want to ask you another question about the sample size. What is the minimum number of observations(sequences) required for Sequential rule mining? Is there a standard or index to define the sample size? in addition to support and confidence, are there other indicators to measure the accuracy or efficiency? > one reviewer comments our data "I think it would be valuable f
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

18. Re: How do you check memory for multiple algorithms?

Good evening, The reason is that Java is an interpreted language and it uses the Garbage collector to free memory. Basically, there is no guarantee when the Java Garbage Collector will free the memory. But in general, the Garbage Collector will just free memory when there is no more memory available. So this is the reason why the memory just add up... It means the GC did not clean the memor
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

19. SPMF 2.52 is released ! (new algorithms etc.)

Good afternoon all, This is to announce the new release of SPMF (v. 2.52) The main modification is the addition of three new algorithms: - The TKU-CE algorithm for heuristically mining the top-k high-utility itemsets with cross-entropy (thanks to Wei Song, Lu Liu, Chuanlong Zheng et al., for the original code) - The TKU-CE+ algorithm for heuristically mining the top-k high-utility itemse
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

20. Re: A real health dataset

Good morning, The problem with health data is that it is often difficult to find due to privacy issues. I know some researchers who had health data about what medicines and treaments some patients took at hospital but such data could not be released. Maybe you can check some websites like Kaggle to see if there is some data available, that you could then transform to use for your purpose.
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

21. CFP UBISEQ 2022 - Guangzhou, China Deadline: June 15, 2022

Call for Papers for the Second International Conference on Ubiquitous Security (UbiSec 2022) Venue & Dates: Zhangjiajie, China, November 15 - 18, 2022 Website: http://ubisecurity.org/2022/ Introduction The Second International Conference on Ubiquitous Security (UbiSec 2022) stems from three conference/symposium/workshop series: (1) The well-established S
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

22. Re: Why is the rule output of the ERMiner algorithm so strange?

By the way, nice to see all these cities name from Canada ;-) This reminds me about my home country. Best regards Philippe
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

23. Re: Why is the rule output of the ERMiner algorithm so strange?

Good evening, I am sorry to answer late. I have been busy with the Chinese New Year, and sometimes I forget to look at the forum. If I forget, you can also send me an email directly to ask me to check it and I will answer faster. I had a look at your data, and I see that many sequences contain 85. The main reason why 85 would appear often in the consequent (right side of a rule) rather th
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

24. Re: CFP : MEDI 2022 : 11th International Conference on Model and Data Engineering

I am PC Chair of this conference and encourage you to submit your papers! It is a good conference. Philippe
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

25. CFP : MEDI 2022 : 11th International Conference on Model and Data Engineering Deadline: July 4th 2022

The Eleventh International Conference on Model & Data Engineering (MEDI) will be held from 21 to 24 November 2022 in Cairo, Egypt. Its main objective is to provide a forum for the dissemination of research accomplishments and to promote the interaction and collaboration between the models and data research communities. MEDI2022 provides an international platform for the presentation of researc
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

26. Re: CALL FOR PAPERS : IEA AIE 2022 （Final extension : 31st January)

I am the PC chair of the conference this year and I hope to see your papers ;-) Philippe
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

27. SPMF 2.51 is released (2 new algorithms)

Good afternoon all! This is to let you know that a new version of SPMF has been released. There are two new algorithms: The SFU-CE algorithm for mining skyline frequent high utility itemsets using the cross-entropy method (thanks to Wei Song, Chuanlong Zheng et al., for the original code) The POERMH algorithm for mining partially ordered episode rules in a sequence of events, using the he
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

28. Re: how to understand memory scalability?

Yes, Skopus is for sequential patterns instead of rules. It is different than a sequential rules but also slightly similar. I think that is the closest algorithm that I can think of. I think there is no algorithm for sequential rules with statistical test. So I dont know if Skopus can be relevant... even if it is different, maybe there is some way to compare. That is why I mentioned this as a
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

29. Re: how to understand memory scalability?

（。。。） Comment3: I would also suggest that authors discuss the practical significance of the results achieved. If I interpret your results correctly (Table 5) the "strongest" sequence r1 had a frequency of 3.8% and many other rules having a frequency of only .9%. Could you elaborate on if a confidence level of .667 would be considered statistically significant? Are these low
Forum: The Data Mining / Big Data Forum

2 years ago

webmasterphilfv

30. Re: how to understand memory scalability?

(...) 2) Comment 2:Are you proposing a method whereby only the common 22 sequences are considered meaningful, and that the unique sequences should be disregarded?. Related to this, I think you need to discuss the validity of your findings. You show that there are 22 sequences that all 4 programs found, but this is evidence of reliability. Please demenstrate the vallidity of your finding
Forum: The Data Mining / Big Data Forum

Pages: 12 3 4 5 ...Last Next

Current Page: 1 of 67

Goto: Forum List•Message List•New Topic•Log In