Tutorial :Implementing shuffle on the celestial jukebox



Question:

How would one implement shuffle for the "Celestial Jukebox"?

More precisely, at each time t, return an uniform random number between 0..n(t), such that there are no repeats in the entire sequence, with n() increasing over time.

For the concrete example, assume a flat-rate music service which allows playing any song in the catalog by a 0 based index number. Every so often, new songs are added which increase range of index numbers. The goal is to play a new song each time (assuming no duplicates in the catalog).

an ideal solution would be feasible on existing hardware - how would I shoehorn a list of six million songs in 8MB of DRAM? Similarly, the high song count exacerbates O(n) selection timings.

-- For an LCG generator, given a partially exhausted LCG on 0..N0, can that be translated to a different LCG on 0..N1 (where N1 > N0), that doen't repeat the exhausted sequence.
-- Checking if a particular song has already been played seems to rapidly grow out of hand, although this might be the only way ? Is there an efficient data structure for this?


Solution:1

The way that I like to do that kind of non-repeating random selection is to have a list, and each time I select an item at random between [0-N), I remove it from that list. In your case, as new items get added to the catalog, it would also be added to the not-yet-selected list. Once you get to the end, simply reload all the songs back to the list.

EDIT:

If you take v3's suggestion into account, this can be done in basically O(1) time after the O(N) initialization step. It guarantees non-repeating random selection.

Here is the recap:

  1. Add the initial items to a list
  2. Pick index i at random (from set of [0,N))
  3. Remove item at index i
  4. Replace the hole at i with the Nth item (or null if i == Nth) and decrement N
  5. For new items, simply append to the end of the list and increment N as necessary
  6. If you ever get to playing through all the songs (which I doubt if you have 6M songs), then add all the songs back to the list, lather, rinse, and repeat.

Since you are trying to deal with rather large sets, I would recommend the use of a DB. A simple table with basically two fields: id and "pointer" (where "pointer" is what tells you the song to play which could be a GUID, FileName, etc, depending on how you want to do it). Have an index on id and you should get very decent performance with persistence between application runs.

EDIT for 8MB limit:

Umm, this does make it a bit harder... In 8 MB, you can store a maximum of ~2M entries using 32-bit keys.

So what I would recommend is to pre-select the next 2M entries. If the user plays through 2M songs in a lifetime, damn! To pre-select them, do a pre-init step using the above algorithm. The one change I would make is that as you add new songs, roll the dice and see if you want to randomly add that song to the mix. If yes, then pick a random index and replace it with the new song's index.


Solution:2

With a limit of 8MB for 6 million songs, there's plainly not room to store even a single 32 bit integer for each song. Unless you're prepared to store the list on disk (in which case, see below).

If you're prepared to drop the requirement that new items be immediately added to the shuffle, you can generate an LCG over the current set of songs, then when that is exhausted, generate a new LCG over only the songs that were added since you began. Rinse and repeat until you no longer have any new songs. You can also use this rather cool algorithm that generates an unguessable permutation over an arbitrary range without storing it.

If you're prepared to relax the requirement of 8MB ram for 6 million songs, or to go to disk (for example, by memory mapping), you could generate the sequence from 1..n at the beginning, shuffle it with fisher-yates, and whenever a new song is added, pick a random element from the so-far-unplayed section, insert the new ID there, and append the original ID to the end of the list.

If you don't care much about computational efficiency, you could store a bitmap of all songs, and repeatedly pick IDs uniformly at random until you find one you haven't played yet. This would take 6 million tries to find the last song (on average), which is still damn fast on a modern CPU.


Solution:3

While Erich's solution is probably better for your specific use case, checking if a song has already been played is very fast (amortized O(1)) with a hash-based structure, such as a set in Python or a hashset<int> in C++.


Solution:4

You could simply generate the sequence of numbers from 1 to n and then shuffle it using a Fisher-Yates shuffle. That way you can guarantee that the sequence won't repeat, regardless of n.


Solution:5

You could use a linked list inside an array: To build the initial playlist, use an array containing a something like this:

 struct playlistNode{    songLocator* song;    playlistNode  *next;  };  struct playlistNode arr[N];  

Also keep a 'head' and 'freelist' pointer;

Populate it in 2 passes:
1. fill in arr with all the songs in the catalog in order 0..N.
2. randomly iterate through all the indexes, filling in the next pointer;

Deletion of songs played is O(1):

head=cur->next;  cur->song=NULL;  freelist->next = freelist;  cur->next=freelist;  freelist=cur;  

Insertion of new songs is O(1) also: pick an array index at random, and patch a new node.

node = freelist;  freelist=freelist->next;  do {  i=rand(N);     } while (!arr[i].song);  //make sure you didn't hit a played node  node->next = arr[i].next;  arr[i].next=node;  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »