Itemset mining in irregular data?

The Data Mining Forum

open-source data mining software

data mining conferences

Data Science for Social and Behavioral Analytics DSSBA 2022

data science journal

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Itemset mining in irregular data?

Posted by: Naalm

Date: November 01, 2018 07:25AM

I have been Following the forum for a while. I Wonder if we can use the itemset mining in irregular data such as social network posts? Please guide me to explore this idea.

Options: Reply•Quote

Re: Itemset mining in irregular data?

Posted by: Rashid

Date: November 07, 2018 05:00PM

What does it mean IRREGULAR DATA?

R.

Options: Reply•Quote

Re: Itemset mining in irregular data?

Posted by: webmasterphilfv

Date: November 08, 2018 06:37AM

Hi, thanks for following the forum. :-) I think you mean "unstructured data". For example, a text document or a tweet do not have a clear structure. In that case, yes, we could do some pattern mining.

For example, you can consider sentences of a text as sequence of symbols (items), and then apply sequential pattern mining to find subsequences of words that appear frequently in tweets or a text document. In my previous work, I for example mined sequential patterns from books to analyze the writing styles of people. In that paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91

In that paper, what we call Skip-grams is basically a sequential pattern.

But for a book it is probably more interesting to find patterns than in short messages like tweets. A tweet is usually very short and people may not write them very well, so it is more challenging to analyze tweets in my opinion than to analyze a book.

Similarly, if we see sequences as bag of words (words without order), than we can apply itemset mining. Each transaction is a sentence and each item is a word. This would find sets of items common to multiple sentences for example.

I think there are various possibilities. I just mention a few that come to my mind.

Perhaps that some other possibilities about social network would be to analyze a matrix of "likes" such as on Facebook, where pages are items, and each user is a transaction. Thus, each transaction would indicates the pages that a user like. Then we could use itemset mining to find some correlation between sets of page that people like together.

Options: Reply•Quote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.