The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 

Pages: PreviousFirst...6162636465...LastNext
Current Page: 63 of 67
Results 1861 - 1890 of 2010
11 years ago
webmasterphilfv
I received this in my e-mail. ========== A post-doctoral position in the area of data and text mining is open at AgroParisTech (Paris, France) Title: "Learning to classify text when labels are taken in ontologies. Search for strategies to optimize the uncertainty of the classification" Location: AgroParisTech, Paris, France Research unit: UMR AgroParisTech/INRA MIA-518 Du
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi, I want to share a link for open-source machine learning implementations in Java, C++, C#, Java, etc. It is the "Machine Learning Open-Source Software" repository: http://mloss.org It offers a search engine to search for source code by programming language and other criteria. It does not contains a lot of entries. But there is some interesting projects that may be usefu
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi everyone, I just want to share this link with you. It is a free book about data mining that you can download for free. The content is: Chapter 1 Data Mining Chapter 2 Large-Scale File Systems and Map-Reduce <== This chapter look interesting. Large scale data mining is a very popular topic now. Chapter 3 Finding Similar Items Chapter 4 Mining Data Streams Chapter 5 Link Analysis
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi Fendi, You are welcome. I'm also happy about your suggestion about the lift because it help me to improve the software. For visualizing the association rules, I don't have anything for that. Currently, the ouptut is just the console or a file. There are some researchers that have worked on how to display the result of association rule mining. For example, here there is some Java code
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello everyone, This is just to let you know that I have added a new version of the SPMF open-source data mining tool (0.83) so that it is now possible to mine association rules with the lift measure. It is therefore now possible to set minsup, minconf and minlift if needed. By the way, there was a problem with the download of the version 0.82. The link was wrong. Now it has been replace
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
I have posted a new version of SPMF with the lift on the website (0.83). You can download it and try it out! Best, Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi Fendi, I will explain to you how to add the lift measure to SPMF. It is not very complciated. The lift is calculated as lift( X --> Y) = sup(X U Y) / (sup(x)*sup(y)). Because the confidence is confidence( X --> Y) = sup(X U Y) / sup(x), we can calculate the lift as follows: lift(x -->Y) = confidence(X -->Y) / sup(y) So I will use this formula to calculate the lift. What
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi Fendi, Welcome to the forum. You can download the new version from the download page of the website: http://www.philippe-fournier-viger.com/spmf/index.php?link=download.php Now i need to go to a meeting. I will answer your question about the lift later today. Best, Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
1869. Re: sort
Hi Setya, Welcome to the forum. I assume that you are using the MainTestAllAssociationRules_FPGrowth_version test file. To sort the result, you could add this code to the class RulesAgrawal.java in the package ca.pfv.spmf.associationrules.agrawal_FPGrowth_version: public void sortByConfidence(){ Collections.sort(rules, new Comparator<RuleAgrawal>() { public int compare
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi, A few good books: Introduction to data mining by Kumar et al. (easy to understand, good introduction) Data Mining Concept and techniques (covers many topics, but does not go in details). There are also many books on specialized topics like social network mining, recommender systems, etc. etc. Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello everyone, This is to let you know that I have updated the SPMF data mining software to version 0.82. This is a minor version. The main difference with 0.81 is that I have fixed a bug in my SPAM algorithm implementation. The bug occurred when the minsup parameter was set to 0. I'm thankful to D. Bhatt who has reported the bug to me. To fix the bug, you can download the new version.
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
I think that it may depends on the datasets. But,in my opinion PrefixSpan is a better algorithm because it uses a pattern-growth approach. It only generates sequential patterns that are in the database. On the other hand, SPAM can generate lot of candidates that do not appear in the database. But SPAM can still be fast because it uses bit vectors, and it is very efficient to calculate the s
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi Dvijesh, Hope you are going well. As you probably know, SPAM uses bit vectors to represents the set of itemsets and sequences that contains a sequential pattern. A bit vector used by SPAM contains a bit for each itemset in each sequence of the sequence database. In my previous implementation, I used a fixed number of bits to represent each sequence in the bit vector. Each sequenc
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi, Elsevier provies a free sample issue of its Knowledge-Based Systems journal. It is a special issue about "new trends in data mining". It is available for free until 2013. The link is here: http://www.sciencedirect.com/science/journal/09507051/25 Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello Alessandra, It seems like a good approach. Here is another approach that could work. To predict the vector of a sequence, you could search for the most similar sequence(s) (like "k-nearest neighboor". But to do that, you would need to define a measure of similarity to compare how similar two sequence are. A measure of similarity that you could use is the longest common subse
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi, welcome to the forum. You can download the IBM generator here: http://forum.ai-directory.com/read.php?5,33 as well as several sequence datasets. Best, Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi again, Another idea: instead of average or mean, you could use the sum. For example: if member 1 has 5 for skill1 , member 2 has 7 for skill1... then the sum would be 13 for skill 1. But I think that you may also not like this idea. I still think that the problem that you want to address can be viewed from many different angles. Another way would be to consider different group
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
You are right. Sorry. I made a mistake. The order of members is relevant and will change the result. That is what I wanted to say. But this is not what I wrote ;-) To avoid the problem of the order of members in a gorup, a solution is to use group average, as you have mentioned previously (not consider each member). Average value for skill 1 Average value for skill2 ... Average value for
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi, For a classifier in general, you have a set of attributes and you want to predict the value of a target attribute. In your case, the attributes would be the skills of a group and the target attribute would be the performance of the group. To train the decision tree, you need use some instances where the value of the target attribute is known. Then, after the training, you can show a new in
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Also, you may search on Google Scholar to see what other people have done to predict group performance. I did not check. But it is always a good idea to check what other people have done before. Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello, Welcome to the forum! For me, your problem is a classification problem. You want to predict the performance of a group from a set of attributes that are the skills of each member. I think that there are several things to consider to predict group performance such as (1) the personality of each group member (can they work well together?), (2) the skill that each member possess with
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello, Welcome to the forum. I did not reply to your message earlier because I am currently traveling in Greece. But some other users have probably read your message. I think that your project is quite interesting. Good project topic! In data mining in general, an important step is preprocessing. It constis in preparing the data in a proper format before applying some algorithms like d
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello LLuis, If you are talking about evaluating the performance of the algorithm implementations in SPMF, the articles describing the algorithms are available in the Documentation section of the website: http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php Several of these articles present some performance comparison or experiments to evaluate things like executio
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hi Dvijesh, These options are parameters of the IBM Generator to generate synthetic datasets. They are NOT parameters of the sequential pattern mining algorithm like PrefixSpan. To see how to use the parameters of the IBM Generator you can use the parameter -help. It prints the description of the parameters: -ncust number_of_customers_in_000s (default: 100) -slen avg_trans_per_custom
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Thanks Dvijesh! I'm glad that you like it. ;-)
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello Lluis, Glad that you found something. :-) Your project seems interesting. May I ask you what kind of data you want to predict? Best, Philippe
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello, Today I received the following question by e-mail. What is the difference between sequential pattern and sequential rules? I will share my answer with everyone. The difference is simple. Consider the following sequence database containing four sequences: s1: {a}, {b}, {c}, {e} s1: {a}, {b}, {c}, {e} s1: {a, f}, {b, g}, {c, h}, s1: {a}, {b}, {g}, {h} A sequential pa
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello Dvijesh, I've searched a little bit and they say that the meaning is : D: number of sequences in the dataset C: average number of itemsets per sequence T: average number of items per itemset S: average number of itemsets in potentially frequent sequences. I: average size of itemsets in potentially frequent sequences N: number of different items in the dataset It is not very cl
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello Luis, Welcome to the forum! If you are interested by the problem of classification, you could have a look at some reference data mining books like the book of Han & Kamber titled "Data Mining, Concept and Techniques". It is a general book about data mining. But there is a chapter about classification and it gives an overview of the main problems and techniques with some c
Forum: The Data Mining / Big Data Forum
11 years ago
webmasterphilfv
Hello Dilek, You are welcome! I did not find how to generate single item in each itemset with the IBM dataset generator. I think that maybe it is not possible with the IBM dataset generator. A solution would be to modify the IBM generator. Otherwise, an alternative is to use a simple sequence database generator in Java that I wrote. The website: http://www.philippe-fournier-viger.com/s
Forum: The Data Mining / Big Data Forum
Pages: PreviousFirst...6162636465...LastNext
Current Page: 63 of 67

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.