The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
Data model for predicting successful music bands [X-Post Forum]
Posted by: Cryptkeeper
Date: June 04, 2012 02:33PM


sorry for the X-Post in the Forum.

currently i am planning a data mining project for a course at the university. My idea was to predict the
success of music bands by their history of album releases - where in this case I define "success" not as the billboard
success but rather how long the band remains in bussiness. I.e. one-hit-wonders vs. bands which releases several albums for
say about at least 5 years.

Some questions in detail would possibly be "How important is the interval from the first album to the second album of a band?" or "Is it necessary to stay at one music label over the period of some albums?" etc.

My problem so far is the modelling of this data for further processing. The first approach to represent a feature vector for each band is:

Name, Begin (year of first Album), End (year of last Album), County, Number of Albums, Average interval between 2 albums

But now i have no clue how i can add the information about the individual albums of the band. For each album i would like to provide features like

- year of release
- season of this year
- country of the first release
- music label
- and maybe some more informations

If i just sequentially add this coulmns for each album this would result in different length of the feature vector for each band. Maybe it would be sufficient to simple set the columns of "short vectors" (i.e. bands which haven't released much albums) with a defined "blank symbol"!?

My idea was to build a decision tree, which indicate the features that predict a band will remain over a longer time period in the bussines.

Maybe sombebody has an idea or an advise for me, which i can represent the data for my project. I hope my problem about the data has become clear.

Big thanks in advance!

Options: ReplyQuote
Re: Data model for predicting successful music bands [X-Post Forum]
Date: June 05, 2012 09:27PM


Welcome to the forum.

I did not reply to your message earlier because I am currently traveling in Greece. But some other users have probably read your message.

I think that your project is quite interesting. Good project topic!smiling smiley

In data mining in general, an important step is preprocessing. It constis in preparing the data in a proper format before applying some algorithms like decision tree learning algorithms. That is what you are doing now! It also consists of removing unnecessary data that are not relevant for your task and putting it in the most appropriate format.

In my opinion, you should decide which algorithm you want to apply to determine the data format. If you use some decision tree algorithm, like C4.5 you can have vectors with numeric or symbolic attributes. The exact format that you will use depends on the software that you will use for performing the data mining. Most likely, it will be some kind of text file format. You would need to check the documentation of the software that you use to see the details about the format.

But more importantly, you need to choose the attributes that you will use for learning decision trees. Which attributes are important? Should they be numeric values? Should they be normalized? Should they be a symbolic value (i.e. string) ? ...

Here is my opinion:
- year of release (I think that this attribute may not be meaningful. Perhaps that you could use the number of years since last album, or the number of year since the first album instead. Could be represented as an integer).
- season of this year (ok, could be represented as a symbolic value like "autumn", "winter" ...)
- country of the first release (ok, could be represented as a symbolic value)
- music label (ok, could be represented as a symbolic value)
- Name (not relevant for data mining),
- Begin (may not be meaningful, as i said before, or perhaps that you could use "decades" instead like 1990s, 1980s... it would be more general ),
- End (year of last Album),
- Number of Albums, (interesting)
- Average interval between 2 albums (interesting)

Maybe you could also use:
- genre (rock, metal, pop...)
- ...

I think that the topic is good. You may need to make some experiments to find out what is the best attributes and the best representation for each attributes.

Hope this helps,

Edited 2 time(s). Last edit at 06/05/2012 09:31PM by webmasterphilfv.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.