The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Itemset mining in irregular data?
Posted by: Naalm
Date: November 01, 2018 07:25AM

I have been Following the forum for a while. I Wonder if we can use the itemset mining in irregular data such as social network posts? Please guide me to explore this idea.

Options: ReplyQuote
Re: Itemset mining in irregular data?
Posted by: Rashid
Date: November 07, 2018 05:00PM

What does it mean IRREGULAR DATA?

R.

Options: ReplyQuote
Re: Itemset mining in irregular data?
Date: November 08, 2018 06:37AM

Hi, thanks for following the forum. :-) I think you mean "unstructured data". For example, a text document or a tweet do not have a clear structure. In that case, yes, we could do some pattern mining.

For example, you can consider sentences of a text as sequence of symbols (items), and then apply sequential pattern mining to find subsequences of words that appear frequently in tweets or a text document. In my previous work, I for example mined sequential patterns from books to analyze the writing styles of people. In that paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91

In that paper, what we call Skip-grams is basically a sequential pattern.

But for a book it is probably more interesting to find patterns than in short messages like tweets. A tweet is usually very short and people may not write them very well, so it is more challenging to analyze tweets in my opinion than to analyze a book.

Similarly, if we see sequences as bag of words (words without order), than we can apply itemset mining. Each transaction is a sentence and each item is a word. This would find sets of items common to multiple sentences for example.

I think there are various possibilities. I just mention a few that come to my mind.

Perhaps that some other possibilities about social network would be to analyze a matrix of "likes" such as on Facebook, where pages are items, and each user is a transaction. Thus, each transaction would indicates the pages that a user like. Then we could use itemset mining to find some correlation between sets of page that people like together.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.