Friday, July 26, 2013

Week 6: Rethinking the Last.fm tags approach

Finding similarity of tracks using the Jaccard similarity has proven problematic, as some tracks have far more tags than others. In such cases, the similarity is close to zero even if one track's tags are a proper subset of the other's. This clearly will not work for comparison.

Thus, a new approach is required. Exaile does this by querying artists similar to the current one, then selecting tracks at random from these other artists. This is a simple but promising approach, and one that can be adapted for Mixxx.

For now, I've changed the Last.fm fetcher to obtain similar artists as "tags," so songs can be compared that way. With my tiny sample library, it seems not to work terribly well, as none of the similar artists are also found in my library, but perhaps this is just a fluke. I'll try to incorporate more of my library to see whether it can be made to work.

As for testing the timbral similarity quantitatively, the test using the ISMIR '04 data is described on p. 55 of Dominik Schnitzer's thesis about Mirage. As he states:
"1. All pieces in a music collection are assigned an adequate genre label [ed note: these are given as a CSV file in the ISMIR data].
2. The files in the collection are analyzed and the full similarity matrix is computed for all pieces in the collection.
3. Then a genre classification confusion matrix is computed so that each song is assigned the genre of its most similar song.
4. In the confusion matrix the predicted genre of a song is plotted against its actual membership.
5. The confusion matrix diagonal shows the classification accuracies for each genre, which is the number of correctly classified songs divided by the total number of songs in the class."
I spent some time looking at the Google Test framework, but I'm not sure if this process can be completely automated using it. At the very least I can write up clear instructions/a shell script to do the test and automate at least part of it in the TimbreUtils class.

Next week is the midterm, so I'll want to hammer out the rest of this artist similarity function and write out instructions for the selector UI so that others can give it a spin.

Friday, July 19, 2013

Week 5: Last.fm integration

This week, I worked on implementing the Last.fm tag fetcher, to add an additional data source for determining song similarity. Since the official liblastfm is covered under GPL v3 and Mixxx uses GPL v2, I switched from using the liblastfm library to connecting to the REST API using Qt's built-in functions.

In order to fetch tags, you can currently use the "Fetch tags..." item in the context menu -- it will later be something that happens during the analysis, if it's enabled in the Selector preferences (not yet implemented). In addition, I added a menu item to "Get follow-up tracks" which will set the currently selected track as a seed for the selector and switch over to Selector view.

The fetcher obtains results from Last.fm fine, but I want to make sure that if any tracks have no tags or could not be found (as was the case for several tracks in my test) that we store that information, so that Last.fm will not be repeatedly queried for them (if they have been updated recently).

Max notes that upon opening Mixxx with a newly created database, the Selector view is empty. I can confirm this is happening, but I'm still not sure why; this is something important to look into further.

As for my solution to using the Hellinger distance for timbral similarity, Max has suggested that I implement a test to compare the success of different distance functions using this data. For this purpose, we can use the ISMIR '04 dataset which is freely available online to assess how well the metric groups together similar tracks.

Next week I'll work on putting such a test in place, as well as implementing the tag similarity functionality and isolating a "get best follow-up" function so it can be called from e.g. the auto-dj.

Friday, July 12, 2013

Week 4: Calculus woes (and eventual solution)

I was less productive this week as I hoped, in large part because of the math difficulties detailed below. Therefore I plan to put in some work this weekend to make sure things remain on track.

The Jensen-Shannon divergence between probability distributions P and Q is defined as:
\[ D_{\mathrm{JS}}(P\|Q)=\frac{1}{2}\left(D_{\mathrm{KL}}(P\|M)+D_{\mathrm{KL}}(Q\|M)\right) \]

where M is a mixture distribution that gives half the weight to P and half to Q. For Gaussians, this turns out to be quite different from the sum of the two distributions; see this StackExchange answer for more detail.

To find an analytic solution for $D_{\mathrm{KL}}(P\|M)$, one ends up with an integral that looks like:

\[ \int_{-\infty}^{\infty}\ln\left(\frac{2p(x)}{p(x)+q(x)}\right)p(x)\,{\rm d}x \]

where $p(x)$ and $q(x)$ are the probability density functions of P and Q, and $m(x) = \frac{1}{2}p(x)+q(x)$.

This ultimately involves finding the integral of the log of the sum of the two density functions, which doesn't seem to work out, regardless of the fact that the covariance matrix is diagonal.

The J-S divergence can also be represented as a difference in entropies, i.e.: $JS(P, Q) +H\left(m(x)\right)-\frac{H(p(x))+H(q(x))}{2}$. There are ways to approximate the entropy of a Gaussian mixture without resorting to numerical methods -- see On Entropy Approximation for Gaussian Mixture Random Vectors for an approach using the Taylor-series expansion.

The zeroth-order expansion gives the estimate

\[ H(m(x))=-\frac{1}{2}(\log m(\mu_{P})+\log m(\mu_{Q})) \]

which might work well enough, but still ends up rather elaborate when expanded.

To my great fortune, however, there are other metrics bounded between [0,1] for which people have already worked out the solution for two multivariate Gaussians, freeing me from attempting any more calculus! In particular I went with one called Hellinger distance, which has the following form:

\[ H^{2}(P,Q)=1-\frac{\left|\Sigma_{P}\right|^{1/4}\left|\Sigma_{Q}\right|^{1/4}}{\left|\frac{1}{2}\Sigma_{P}+\frac{1}{2}\Sigma_{Q}\right|^{1/2}}\exp\left\{ -\frac{1}{8}(\mu_{P}-\mu_{Q})^{T}(\frac{1}{2}\Sigma_{P}+\frac{1}{2}\Sigma_{Q})(\mu_{P}-\mu_{Q})\right\} \]

This article (PDF) called "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions" notes that the Hellinger distance is relatively "close" to JS divergence as a metric, so it may well be good enough.

I also added the cosine similarity for the beat spectra, so the similarity scores are starting to look more reasonable.

Upon looking at the BBC Vamp plugin more carefully, I realize that its notion of "energy" is really just to do with the sense of sound output, and not with the intuitive sense of energy that would be useful in putting together a set. Thus I think it's best to put that idea aside for now and focus on the genre comparison.

This weekend I'll polish off these distance calculations, and also to investigate an occasional crash that happens when the distance functions try to access the timbre model.

For next week, I aim to get the UI into a better state (especially the currently non-functional sliders) and to lay the groundwork for the Last.fm tags fetcher (in particular, creating the database tables where it will store its information, and the SimilarityDAO class to access that information).

Friday, July 5, 2013

Week 3: Adding timbral similarity

This week, I incorporated the timbral modeling plugin from Queen Mary and created some additional classes to deal with timbral similarity calculations.

I had to modify the Queen Mary SimilarityPlugin to output a vector<float> containing the Gaussian timbral model and beat spectrum. The vector's first three elements give the length of the mean, variance, and beat spectrum vectors respectively, followed by the concatenation of those three vectors.

It turns out that the symmetrised KL divergence doesn't have a specific upper bound, so it's not suitable on its own for combining with other metrics. However, the related Jensen-Shannon divergence is bounded by [0,1], and might be a better fit if it can be efficiently calculated.

As an aside, it seems there is some opportunity for refactoring among the Analyser classes, since AnalyserBeats, AnalyserKey, and AnalyserTimbre all have similar tasks related to serializing protobuf messages to the database and retrieving them. Today I ran into a problem where the TimbreModel was not actually serializing as I had thought, because I bound the data to the query in TrackDAO::bindTrackToLibraryInsert but not in updateTrack or addTracksPrepare (which required almost identical code each time). If these commands were factored out into a separate function, they could easily be applied to queries in all three places.

For next week, I'll investigate the possibilities for such a refactoring, populate the new preference panes with options for ranking the different similarity functions, implement the J-S divergence, and add in the track energy measure as another source of data.

Monday, July 1, 2013

How to add a computed column

My current goal is to add an extra column to the SelectorLibraryTableModel class that contains a "similarity score" for each row in the model. This score will be calculated on-the-fly in relation to a seed track.

My first approach was to override the columnCount, flags, headerData, and data functions inherited from LibraryTableModel, to create the illusion of an additional column in the model. (See this Qt forum post for a minimal example.) This new column displayed correctly in the WTrackTableView, but because it did not exist in the underlying SQL table, the BaseSqlTableModel could not use it to sort.

It might be possible to use that approach if I additionally overrided select and orderByClause, but it seems messy. One alternative might be to use a temporary table, following a procedure like this:

  1. Do setTableModel as usual to create the "library_view," without the score column.
  2. As filters are added and the selected set gets smaller, check in rowCount if the rows are below a certain critical value (say 100 rows).
  3. Once rowCount drops below the critical value, create a temporary table based on the current SQL query, and add the "score" column to it, i.e.:

    CREATE TEMP TABLE selector AS (SELECT * FROM library_view);
    ALTER TABLE selector ADD COLUMN score REAL;
  4. Calculate the similarity scores and update the temporary table with them.

  5. Set the table model to this temporary table.

This has the benefit that sorting would work as expected and only rowCount and setTableModel would have to be overridden. Potential problems with this approach:

  • If the filters change, the old temporary table has to be dropped and a new one created. That could get expensive.
  • Others?