The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
big data
Posted by: david
Date: November 15, 2018 10:36AM

Hello,

we are living in the era of big data, So the size of dataset the mean issue with big data

Is there any other challenges can be considered rather than size?

thanks for this helpful form.



Best regards,
David

Options: ReplyQuote
Re: big data
Date: November 15, 2018 05:05PM

Hi,

About Big Data, some researchers say that there are the five V of Big data that are important: Volume, Velocity, Variety, etc.

But besides that, I would like to point out that some problems are easy even if we have big data, and some problems are difficult even if you don't have a lot of data. Actually, the difficulty of a computing problem is sometimes more influenced by the parameters of the problem than by the data size. This is for example the case in some problems such as itemset mining, where reducing the "minsup" parameter can exponentially increase the difficulty of the problem, while some algorithm scale in linear times when the data size is increased.

There are also many interesting problems related to big data such as stream data mining, where the data is potentially infinite. Or mining complex data like social graphs, etc.

Options: ReplyQuote
Re: big data
Posted by: david
Date: November 15, 2018 11:49PM

Thanks,

it is really helpful,
I totally agree and that was my question without 5 V's challenges I am looking for another challenges related to big data to be considered while using frequent itemset mining.

So, Please let me know which challenges interesting in big data that can be solved using frequent itemset mining


Best
Sadeq

Options: ReplyQuote
Re: big data
Date: November 16, 2018 03:32AM

Hello again,

Specifically for itemset mining, there are several challenges related to big data:
- design some parallel algorithms that run on big data architectures like hadoop, spark, etc. There exists a few already, but perhaps they can be improved or you can design algorithms for other pattern mining problems or variations of the itemset mining problem.
- you can design algorithms for mining itemsets in data streams. There are also exist some. But you can work on some variation of the itemset mining problem, for example, or make something more efficient. There are various possibilities.
- you can work on some topic related to big data, like preserving privacy when we do itemset mining. Privacy preserving data mining is relevant for big data because the data is potentially distributed and processed on multiple servers. So we want to protect privacy... If you are interested by this, you can check our PSPF software for privacy preserving pattern mining. It is not for big data, but it can help to see the idea about this type of problems.
- ...

Those are the ideas that come to my mind.

Best regards

Philippe

Options: ReplyQuote
Re: big data
Posted by: David
Date: November 16, 2018 09:06AM

Thanks so much,

I deeply appreciate your helpful comments

these ideas are related to the size of datasets.

I am looking to challenges with big data with respect to FIM without parallel,
what I mean other challenges rather than size.

kind regards

Options: ReplyQuote
Re: big data
Date: November 17, 2018 04:11AM

I see. Actually, the challenge of privacy is not just for large datasets. As long as you have to transfer data on a network or between different organizations, the challenge of privacy will occur. I am not sure for other ideas. That's all the ideas that I have now ;-) Maybe others have some other ideas to share.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.