Best approach to guess missing fields using incomplete datasets

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Best approach to guess missing fields using incomplete datasets

Posted by: S Nath

Date: March 08, 2013 05:24PM

Hi,
Disclaimer: I’m a health informatics expert with very limited data mining knowledge, so my apologies for any obviously stupid comments that I make.
I’m developing an open source health application for underdeveloped countries. As part of this, I’m required to ‘guess’ missing data fields using existing data fields.
Now, I’ve heard that there are many ways to do this, but I’m afraid that I’ve failed to identify the best for my scenario.
I’ve heard that clustering can be a good solution to this problem. However, my records can be very very incomplete, which may affect the success of this approach.
It struck me that I could identify all association rules in my dataset, and then use associations to guesstimate the missing data based on their support and confidence.
My questions are,

Is this the valid way to do this?
What method would you recommend as the best approach to solve this?

Options: Reply•Quote

Re: Best approach to guess missing fields using incomplete datasets

Posted by: webmasterphilfv

Date: March 10, 2013 11:11AM

Hi,

Welcome to the forum.

There are different ways to fill incomplete data.

Using association rules is a solution. Since association rules represents associations and have some kind of probability, it would make some sense to use it.

Another way would be to train some neural networks with some records to then predict the value of a missing attribute for other records.

Another way is to use clustering as you have mentioned. Given a record, compute the closest records and then use their values to fill the missing values of the record.

Another way would be to not fill the missing data. But to use some algorithms to analyses your data that is tolerant to missing data.

Another would be to use a statistical approach. You may read this: http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html There is some link to filling missing data by using some softwares such as SPSS... I think that maybe that you can find some specialized software that can perform this task for you perhaps.

This is what I know about this subject. Actually I have never done it by myself.;-) My only concern with filling missing data would be to not mess with the statistical significance of the data.

Philippe

Options: Reply•Quote

Re: Best approach to guess missing fields using incomplete datasets

Posted by: suranga

Date: March 11, 2013 04:17AM

Thank you Philippe,

No worries, this information itself was quite helpful to me. Hopefully, someone else with more knowledge on the subject will drop me some further tips.

But thank you for all your help :-)

Options: Reply•Quote