The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
DATA Mining with XML dataset
Posted by: Rohit
Date: September 10, 2014 09:42PM

I have to mine data from an XML file of over 400000 lines. Should i convert it into structered form first..But if i convert it into structured form wont it required parsing of the entire table..



Edited 1 time(s). Last edit at 09/16/2015 07:23AM by webmasterphilfv.

Options: ReplyQuote
Re: DATA Mining
Date: September 11, 2014 06:06AM

Hi,

I think that it depends on the data mining tool or algorithm implementation that you will use to perform the data mining.

If the data mining tool can read XML, then it may be ok. But most likely, you may need to convert it to some other format before applying a data mining tool.

Also, in general when you perform data mining, not all information is relevant. For example, if you have an XML file about students, some information about the students may not be relevant for your data mining task. So at the same time as you are converting to the proper format, you may also filter some irelevant information.

If you are writing your own data mining algorithm, then you may design some code to read your XML file directly without converting it.

If you want to convert your file, you can write some code that read the xml file line by line and output line by line in an output file. Thus, you probably don't need to read the file completely into memory.

Hope this helps,

Options: ReplyQuote
Re: DATA Mining
Posted by: Pooja jardosh
Date: December 18, 2014 03:16AM

Hello Rohit,
I am also working on My research which will take XML documents as input and mine association rule from it.
I want to know the solution you choose for your problem
And if is it possible,can you please provide me your implemented code for XML data retrieval??You can mail me on my email id-pooja2011er@gmail.com

Waiting for your favorable reply

Thanking You,
Pooja

Options: ReplyQuote
Re: DATA Mining
Posted by: Pooja jardosh
Date: December 22, 2014 11:57PM

What are the parameters can be considered while sampling dataset and to perform mining algorithm on it.?Here i am listing some questions i want to get answer for:

1.What is sampling function??how to sample dataset that it doesn't drop accuracy and improves efficiency.

2.What parameters can be considered for sampling dataset?

3.Out of various sampling techniques,which is the best solution for sampling dataset for association rule mining?

4.How XML documents stored in Microsoft database repository can be sampled??

Anything you know about,can share.No matter how much and how true you know about.Please share.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.