Show all posts by user

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto: Forum List•Message List•New Topic•Log In

Pages: Previous First...61 626364 65 ...Last Next

Current Page: 63 of 67

Results 1861 - 1890 of 2010

11 years ago

webmasterphilfv

1861. Post-doc positions in data mining

I received this in my e-mail. ========== A post-doctoral position in the area of data and text mining is open at AgroParisTech (Paris, France) Title: "Learning to classify text when labels are taken in ontologies. Search for strategies to optimize the uncertainty of the classification" Location: AgroParisTech, Paris, France Research unit: UMR AgroParisTech/INRA MIA-518 Du
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1862. Machine Learning Repository

Hi, I want to share a link for open-source machine learning implementations in Java, C++, C#, Java, etc. It is the "Machine Learning Open-Source Software" repository: http://mloss.org It offers a search engine to search for source code by programming language and other criteria. It does not contains a lot of entries. But there is some interesting projects that may be usefu
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1863. Free data mining book: "Mining massive datasets"

Hi everyone, I just want to share this link with you. It is a free book about data mining that you can download for free. The content is: Chapter 1 Data Mining Chapter 2 Large-Scale File Systems and Map-Reduce <== This chapter look interesting. Large scale data mining is a very popular topic now. Chapter 3 Finding Similar Items Chapter 4 Mining Data Streams Chapter 5 Link Analysis
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1864. Re: Lift Ratio

Hi Fendi, You are welcome. I'm also happy about your suggestion about the lift because it help me to improve the software. For visualizing the association rules, I don't have anything for that. Currently, the ouptut is just the console or a file. There are some researchers that have worked on how to display the result of association rule mining. For example, here there is some Java code
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1865. SPMF 0.83 is released - 2012-07-04 - mine association rules with the lift

Hello everyone, This is just to let you know that I have added a new version of the SPMF open-source data mining tool (0.83) so that it is now possible to mine association rules with the lift measure. It is therefore now possible to set minsup, minconf and minlift if needed. By the way, there was a problem with the download of the version 0.82. The link was wrong. Now it has been replace
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1866. Re: Lift Ratio

I have posted a new version of SPMF with the lift on the website (0.83). You can download it and try it out! Best, Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1867. Re: Lift Ratio

Hi Fendi, I will explain to you how to add the lift measure to SPMF. It is not very complciated. The lift is calculated as lift( X --> Y) = sup(X U Y) / (sup(x)*sup(y)). Because the confidence is confidence( X --> Y) = sup(X U Y) / sup(x), we can calculate the lift as follows: lift(x -->Y) = confidence(X -->Y) / sup(y) So I will use this formula to calculate the lift. What
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1868. Re: SPMF 0.82 is released! - 30-06-12 - bug fix in SPAM implementation

Hi Fendi, Welcome to the forum. You can download the new version from the download page of the website: http://www.philippe-fournier-viger.com/spmf/index.php?link=download.php Now i need to go to a meeting. I will answer your question about the lift later today. Best, Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1869. Re: sort

Hi Setya, Welcome to the forum. I assume that you are using the MainTestAllAssociationRules_FPGrowth_version test file. To sort the result, you could add this code to the class RulesAgrawal.java in the package ca.pfv.spmf.associationrules.agrawal_FPGrowth_version: public void sortByConfidence(){ Collections.sort(rules, new Comparator<RuleAgrawal>() { public int compare
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1870. Re: Where can I find good DATA MINING BOOKS?

Hi, A few good books: Introduction to data mining by Kumar et al. (easy to understand, good introduction) Data Mining Concept and techniques (covers many topics, but does not go in details). There are also many books on specialized topics like social network mining, recommender systems, etc. etc. Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1871. SPMF 0.82 is released! - 30-06-12 - bug fix in SPAM implementation

Hello everyone, This is to let you know that I have updated the SPMF data mining software to version 0.82. This is a minor version. The main difference with 0.81 is that I have fixed a bug in my SPAM algorithm implementation. The bug occurred when the minsup parameter was set to 0. I'm thankful to D. Bhatt who has reported the bug to me. To fix the bug, you can download the new version.
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1872. Re: SPAM vs PrefixSpan

I think that it may depends on the datasets. But,in my opinion PrefixSpan is a better algorithm because it uses a pattern-growth approach. It only generates sequential patterns that are in the database. On the other hand, SPAM can generate lot of candidates that do not appear in the database. But SPAM can still be fast because it uses bit vectors, and it is very efficient to calculate the s
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1873. Re: SPMF 0.81 is released! - Improved SPAM implementation

Hi Dvijesh, Hope you are going well. As you probably know, SPAM uses bit vectors to represents the set of itemsets and sequences that contains a sequential pattern. A bit vector used by SPAM contains a bit for each itemset in each sequence of the sequence database. In my previous implementation, I used a fixed number of bits to represent each sequence in the bit vector. Each sequenc
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1874. Free issue: "New trends in data mining"

Hi, Elsevier provies a free sample issue of its Knowledge-Based Systems journal. It is a special issue about "new trends in data mining". It is available for free until 2013. The link is here: http://www.sciencedirect.com/science/journal/09507051/25 Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1875. Re: Frequent subsequences for prediction

Hello Alessandra, It seems like a good approach. Here is another approach that could work. To predict the vector of a sequence, you could search for the most similar sequence(s) (like "k-nearest neighboor". But to do that, you would need to define a measure of similarity to compare how similar two sequence are. A measure of similarity that you could use is the longest common subse
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1876. Re: May i want u to send me IBM generator exe file ??

Hi, welcome to the forum. You can download the IBM generator here: http://forum.ai-directory.com/read.php?5,33 as well as several sequence datasets. Best, Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1877. Re: Data Mining of Groups

Hi again, Another idea: instead of average or mean, you could use the sum. For example: if member 1 has 5 for skill1 , member 2 has 7 for skill1... then the sum would be 13 for skill 1. But I think that you may also not like this idea. I still think that the problem that you want to address can be viewed from many different angles. Another way would be to consider different group
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1878. Re: Data Mining of Groups

You are right. Sorry. I made a mistake. The order of members is relevant and will change the result. That is what I wanted to say. But this is not what I wrote ;-) To avoid the problem of the order of members in a gorup, a solution is to use group average, as you have mentioned previously (not consider each member). Average value for skill 1 Average value for skill2 ... Average value for
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1879. Re: Data Mining of Groups

Hi, For a classifier in general, you have a set of attributes and you want to predict the value of a target attribute. In your case, the attributes would be the skills of a group and the target attribute would be the performance of the group. To train the decision tree, you need use some instances where the value of the target attribute is known. Then, after the training, you can show a new in
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1880. Re: Data Mining of Groups

Also, you may search on Google Scholar to see what other people have done to predict group performance. I did not check. But it is always a good idea to check what other people have done before. Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1881. Re: Data Mining of Groups

Hello, Welcome to the forum! For me, your problem is a classification problem. You want to predict the performance of a group from a set of attributes that are the skills of each member. I think that there are several things to consider to predict group performance such as (1) the personality of each group member (can they work well together?), (2) the skill that each member possess with
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1882. Re: Data model for predicting successful music bands [X-Post kdnuggets.com Forum]

Hello, Welcome to the forum. I did not reply to your message earlier because I am currently traveling in Greece. But some other users have probably read your message. I think that your project is quite interesting. Good project topic! In data mining in general, an important step is preprocessing. It constis in preparing the data in a proper format before applying some algorithms like d
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1883. Re: Comparative Analysis

Hello LLuis, If you are talking about evaluating the performance of the algorithm implementations in SPMF, the articles describing the algorithms are available in the Documentation section of the website: http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php Several of these articles present some performance comparison or experiments to evaluate things like executio
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1884. Re: Sequential pattern mining datasets

Hi Dvijesh, These options are parameters of the IBM Generator to generate synthetic datasets. They are NOT parameters of the sequential pattern mining algorithm like PrefixSpan. To see how to use the parameters of the IBM Generator you can use the parameter -help. It prints the description of the parameters: -ncust number_of_customers_in_000s (default: 100) -slen avg_trans_per_custom
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1885. Re: The difference between sequential patterns and sequential rules

Thanks Dvijesh! I'm glad that you like it. ;-)
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1886. Re: Comparative Analysis

Hello Lluis, Glad that you found something. :-) Your project seems interesting. May I ask you what kind of data you want to predict? Best, Philippe
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1887. The difference between sequential patterns and sequential rules

Hello, Today I received the following question by e-mail. What is the difference between sequential pattern and sequential rules? I will share my answer with everyone. The difference is simple. Consider the following sequence database containing four sequences: s1: {a}, {b}, {c}, {e} s1: {a}, {b}, {c}, {e} s1: {a, f}, {b, g}, {c, h}, s1: {a}, {b}, {g}, {h} A sequential pa
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1888. Re: Sequential pattern mining datasets

Hello Dvijesh, I've searched a little bit and they say that the meaning is : D: number of sequences in the dataset C: average number of itemsets per sequence T: average number of items per itemset S: average number of itemsets in potentially frequent sequences. I: average size of itemsets in potentially frequent sequences N: number of different items in the dataset It is not very cl
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1889. Re: Comparative Analysis

Hello Luis, Welcome to the forum! If you are interested by the problem of classification, you could have a look at some reference data mining books like the book of Han & Kamber titled "Data Mining, Concept and Techniques". It is a general book about data mining. But there is a chapter about classification and it gives an overview of the main problems and techniques with some c
Forum: The Data Mining / Big Data Forum

11 years ago

webmasterphilfv

1890. Re: Sequential pattern mining datasets

Hello Dilek, You are welcome! I did not find how to generate single item in each itemset with the IBM dataset generator. I think that maybe it is not possible with the IBM dataset generator. A solution would be to modify the IBM generator. Otherwise, an alternative is to use a simple sequence database generator in Java that I wrote. The website: http://www.philippe-fournier-viger.com/s
Forum: The Data Mining / Big Data Forum

Pages: Previous First...61 626364 65 ...Last Next

Current Page: 63 of 67

Goto: Forum List•Message List•New Topic•Log In