The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 

Pages: PreviousFirst...6364656667Next
Current Page: 65 of 67
Results 1921 - 1950 of 2010
12 years ago
webmasterphilfv
Hello, Today I received a question by e-mail : "What are the applications of sequential pattern mining?" Here is my answer. Feel free to add some other applications by replying. Mining transactional data It is possible to mine sequential patterns in sequences of transactions from a store. In this case, each sequence represents the transactions from a customer at the store. F
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello everyone, For those of you interested in social network mining and social network analysis, I have found two websites for getting live social network data from social networks websites like Twitter, Facebook, Hulu, Digg, Myspace, Google+, Youtube. I want to share this with you: GNIP http://gnip.com/ This website offers live social network data for many networks like Twitter, Faceb
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
The top 5 data mining articles according to this website: (www.dataminingblog.com/top-five-articles-in-data-mining/ ) - An Introduction to Variable and Feature Selection. Isabelle Guyon and André Elisseeff - Data Clustering: A Review. A.K. Jain, M.N. Murty and P.J. Flynn - From Data Mining to Knowledge Discovery in Databases. Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth - N
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello everyone, I would like to know your opinions about what are the top books or top articles on data mining. Here are my favorites: Top data mining books Han & Kamber (2011) Data Mining Concepts and Techniques, 3rd Edition. Comments: A very good book covering many topics. It is like an encyclopedia of data mining. Not a lot of details on each algorithms. But it covers many sub
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
An interesting article about the most popular programming langages right now (2012): http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Here is the top 10: 1- Java 2- C 3- C# 4- C++ 5- Objective C 6- PHP 7- VB 8- Javascript 9- Python 10- Perl It is not directly related to data mining. But I thought that it is an interesting article to share! Philippe
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello everyone, I was looking at the Oracle Data Mining page and I just want to share my thoughts about it. The webpage: http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/odm-techniques-algorithms-097163.html The webpage list the data mining tasks/algorithms that Oracle has implemented: - Apriori - SVM - Matrix factorization - KMeans - Orthogonal partiti
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
I agree with Tisonet. Programming data mining algorithms in PHP may not be the best solution. For algorithms like C4.5, I would rather use something like C++, Java or C# for example.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
I think it is because of the BIT_PER_SECTION. In my implementation of SPAM, I use the same amount of memory for each sequence and it may be a problem because there is some very long sequences in BMS and Kosarak. I send you some smaller dataset. I have removed the longest sequences in them. BMS_10k_smaller_than30.txt : the first 10k lines of BMS without sequence longer than 30 itemsets. BM
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Thanks! I will add this to the FAQ on the website.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
For running the test for SPAM, yes it is the class MainTestSPAM. If you still run out of memory with 1 GB, it might be because the dataset is too large for SPAM with the value that you set for BIT_PER_SECTION. If you have enough memory you could try 2GB. Or you could try with loading only half of the dataset (to do that you would to modify a little bit the code). Another reason may be that
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Yes... that is right. - researchers do the research - researchers write the article - other researchers review the papers and.... the publisher make the money by selling your article to other researchers!
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
I don't have netbeans on my computer. But I have searched on Google and some people say: "In NetBeans, you can add command line options using the Properties of the Project, the Run option. There is an option for the JVM command line there. Look at the -Xms and -Xmx options." More information here about how to do it: http://www.codemiles.com/java/here-how-to-increase-java-heap-si
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Yes. I think that it is ridiculous.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hi, For memory there is two ways: - change the parameters of the algorithm - increase the memory that the Java virtual machine can use ( see here for instructions about how to do this: http://www.philippe-fournier-viger.com/spmf/index.php?link=FAQ.php#memory )
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Ok. I will explain. Elsevier is a publisher. It publishes several journals. But to access the articles, users needs to pay or your university need to pay them so that you can access the articles for free. Now, consider that you are an author that publishes an article in an Elsevier journal. Elsevier offers that you pay 3000 $ US so that your article become free to everyone (users don't n
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
I received a newsletter from the publisher Elsevier today. In the e-mail there is a link to an article on their website: "Open Access The choice is yours - Open Access options at Elsevier". Basically, when i click on the subpage "information" it says that when you publish an article at Elsevier, you can pay 3000 $ US so that it becomes available for free to everyone. S
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Thanks tisonet. Very interesting article. I agree BIDE is probably the fastest for closed sequential patterns. There is an article about an algorith named COBRA that claimed to be faster than BIDE. But it was published in a small conference so i'm not sure if it is true or not. By the way, if some of you are interested, I have a set of papers on sequential pattern mining that I have collecte
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Thanks tisonet. I tried to make some binary version for you: BMS Kosarak70K (70 000 first lines of Kosarak) Snake (will send to your private message). I think it should work. But i did not have time to test it. Philippe
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello, There is no changes to do for PrefixSpan. The only change to do is for SPAM. By default the implementation assumes that there is no sequences longuer than 32 itemsets. In BMS there is 59601 sequences. The longest sequence is 267 itemsets. In the part of Kosarak that I gave to you, there is 70 000 sequences and the longest sequence is 796 itemsets So for BMS, BIT_PER_
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Then you try again with a lower minimum support like 0.01 and you will get some patterns. For 0.01 i get 78 patterns with PrefixSpan 0.01 means 1%. If you use a lower support, you will get even more patterns.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello Vivek, I converted 3 datasets for you: - BMS - Kosarak (contains only the first 70 000 sequences) - Snake For Snake, I will tell you by private message because it is not a public dataset so I don't want to post it here. Philippe
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Very good answer. In addition to this, there is some experiments in the papers. For example, in the SPAM paper, they compare SPAM with PrefixSpan. Sometimes experiments in papers are not fair because the authors choose some datasets where their algorithm performs better. But it can give a rough idea about their relative performance.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Good analysis. I agree with you. Yes. There is many possibilities for improving it and adding more features. By the way, I have read your email. I will give you answer you later today or tomorrow.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
vivek basati Wrote: ------------------------------------------------------- > hi > > My favorite too is prefix span but in SPMF frame > work its implemented only to small file with > integers and strings. > > can you upload a source code for some large data > bases like transactional data??? Hello, What is the problem with large files? Did you run out of
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello, Thanks for asking the question in the forum. OK. I will explain the difference between absolute minimum support and relative minimum support. Consider that you have a sequence database containing 1000 sequences. If you say minsup = 50 % it is the same thing as saying minsup = 500 sequences. 500 is called an absolute minimum support. 0.5 is called a relative minimum suppor
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
tisonet Wrote: ------------------------------------------------------- > Hi, I love the idea of PrefixSpan algorithm, too. > > I dont know why it is not in TOP 10 DATA MINING > ALGORITHM. I think that it is because PrefixSpan is not as famous as some other algorithms. The list of top 10 algorithms was established in 2006. If the same list was done again today, it would p
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hi Dvijesh, Yes. They should at least say "No." or "I lost the source code or the implementation" or something like that. By the way, I found this journal: "Machine Learning Open Source Software". They offer to publish source code of algorithms with a short article. It is possible to submit a paper describing an algorithm implementation and to give them the s
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello everyone, I would like to ask you: What is your favorite data mining algorithm? Personally, I have a few favorites: - the PrefixSpan algorithm for sequential pattern mining. This algorithm is relatively simple. There is also some good ideas in its design such as pseudo-projection and pattern-growth that makes it very efficient. - the Apriori algorithm for frequent itemset mining.
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello, I think that there are a few options: You could write a simple program that convert the file generated by the IBM generator to the format that you want. You could change the method for reading the file in your software so that it read the format generated by the IBM generator. For example, you can check the code that I wrote on this page for reading binary sequence databases generat
Forum: The Data Mining / Big Data Forum
12 years ago
webmasterphilfv
Hello everyone, Here is an interesting article about how to use data mining in game design: http://www.gamasutra.com/view/feature/2816/better_game_design_through_data_.php The authors claims that data mining can help MMOG design by: 1. To balance the economy 2. To catch cheaters 3. To cut production costs 4. To increase customer renewal Besides that, I found that some people pro
Forum: The Data Mining / Big Data Forum
Pages: PreviousFirst...6364656667Next
Current Page: 65 of 67

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.