there are quite a few methods for estimating the entropy without estimating the underlying distribution. this is particularly important for data that are not independent, since then you run out of data pretty quickly. some discussion and a lot of references are given here: http://pages.cs.aueb.gr/users/yiannisk/PAPERS/neuro.pdf