Re: HUI-Miner MapReduce
Date: June 21, 2014 02:37PM
The two first database scan are not costly compared to the mining phase that follows it. You just need to read each line once for each scan. So even if you did this step on a single node, I think that it would not be a problem.
But the TWU can be parallelized. TWU of an item is the sum of transaction utilities where the item appear.
So, you could split the database into n smaller databases, then calculate the TWU of each item for each subdatabase. Finally, a node could receive the calculated TWU from each subdatabase and make the sum of the TWU of each subdatabase for each item. This would give the TWU of each item for the whole database.
Edited 4 time(s). Last edit at 06/21/2014 02:40PM by webmasterphilfv.