Instead of modeling the gradient of the log of the probability, try to model the log probability directly.