I am trying to implement the value iteration algorithm of the Markov Decision Process using python. I have one implementation. But, this is giving me many repeated values for the utilities. My transition matrix is quite sparse. Probably, this is causing the problem. But, I am not very sure if this assumption is correct. How should I correct this? The code might be pretty shoddy. I am very new to value iteration. So please help me identify problems with my code. The reference code is this :
http://carlo-hamalainen.net/stuff/mdpnotes/. I have used the ipod_mdp.py code file. Here is the link to the snippet of my implementation:
http://stackoverflow.com/questions/27899682/repeating-utility-values-in-value-iteration-markov-decision-processThank you very much!