Friday, July 12, 2013

Week 4: Calculus woes (and eventual solution)

I was less productive this week as I hoped, in large part because of the math difficulties detailed below. Therefore I plan to put in some work this weekend to make sure things remain on track.

The Jensen-Shannon divergence between probability distributions P and Q is defined as:
\[ D_{\mathrm{JS}}(P\|Q)=\frac{1}{2}\left(D_{\mathrm{KL}}(P\|M)+D_{\mathrm{KL}}(Q\|M)\right) \]

where M is a mixture distribution that gives half the weight to P and half to Q. For Gaussians, this turns out to be quite different from the sum of the two distributions; see this StackExchange answer for more detail.

To find an analytic solution for $D_{\mathrm{KL}}(P\|M)$, one ends up with an integral that looks like:

\[ \int_{-\infty}^{\infty}\ln\left(\frac{2p(x)}{p(x)+q(x)}\right)p(x)\,{\rm d}x \]

where $p(x)$ and $q(x)$ are the probability density functions of P and Q, and $m(x) = \frac{1}{2}p(x)+q(x)$.

This ultimately involves finding the integral of the log of the sum of the two density functions, which doesn't seem to work out, regardless of the fact that the covariance matrix is diagonal.

The J-S divergence can also be represented as a difference in entropies, i.e.: $JS(P, Q) +H\left(m(x)\right)-\frac{H(p(x))+H(q(x))}{2}$. There are ways to approximate the entropy of a Gaussian mixture without resorting to numerical methods -- see On Entropy Approximation for Gaussian Mixture Random Vectors for an approach using the Taylor-series expansion.

The zeroth-order expansion gives the estimate

\[ H(m(x))=-\frac{1}{2}(\log m(\mu_{P})+\log m(\mu_{Q})) \]

which might work well enough, but still ends up rather elaborate when expanded.

To my great fortune, however, there are other metrics bounded between [0,1] for which people have already worked out the solution for two multivariate Gaussians, freeing me from attempting any more calculus! In particular I went with one called Hellinger distance, which has the following form:

\[ H^{2}(P,Q)=1-\frac{\left|\Sigma_{P}\right|^{1/4}\left|\Sigma_{Q}\right|^{1/4}}{\left|\frac{1}{2}\Sigma_{P}+\frac{1}{2}\Sigma_{Q}\right|^{1/2}}\exp\left\{ -\frac{1}{8}(\mu_{P}-\mu_{Q})^{T}(\frac{1}{2}\Sigma_{P}+\frac{1}{2}\Sigma_{Q})(\mu_{P}-\mu_{Q})\right\} \]

This article (PDF) called "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions" notes that the Hellinger distance is relatively "close" to JS divergence as a metric, so it may well be good enough.

I also added the cosine similarity for the beat spectra, so the similarity scores are starting to look more reasonable.

Upon looking at the BBC Vamp plugin more carefully, I realize that its notion of "energy" is really just to do with the sense of sound output, and not with the intuitive sense of energy that would be useful in putting together a set. Thus I think it's best to put that idea aside for now and focus on the genre comparison.

This weekend I'll polish off these distance calculations, and also to investigate an occasional crash that happens when the distance functions try to access the timbre model.

For next week, I aim to get the UI into a better state (especially the currently non-functional sliders) and to lay the groundwork for the Last.fm tags fetcher (in particular, creating the database tables where it will store its information, and the SimilarityDAO class to access that information).

No comments:

Post a Comment