Best approach to guess missing fields using incomplete datasets
Posted by:
S Nath
Date: March 08, 2013 05:24PM
Hi,
Disclaimer: I’m a health informatics expert with very limited data mining knowledge, so my apologies for any obviously stupid comments that I make.
I’m developing an open source health application for underdeveloped countries. As part of this, I’m required to ‘guess’ missing data fields using existing data fields.
Now, I’ve heard that there are many ways to do this, but I’m afraid that I’ve failed to identify the best for my scenario.
I’ve heard that clustering can be a good solution to this problem. However, my records can be very very incomplete, which may affect the success of this approach.
It struck me that I could identify all association rules in my dataset, and then use associations to guesstimate the missing data based on their support and confidence.
My questions are,
Is this the valid way to do this?
What method would you recommend as the best approach to solve this?