Tutorial :Calculating “Kevin Bacon” Numbers



Question:

I've been playing around with some things and thought up the idea of trying to figure out Kevin Bacon numbers. I have data for a site that for this purpose we can consider a social network. Let's pretend that it's Facebook (for simplification of discussion). I have people and I have a list of their friends, so I have the connections between them. How can I calculate the distance from one person to another (basically, a Kevin Bacon number)?

My best idea is a Bidirectional search, with a depth limit (to limit computational complexity and avoid the problem of people who simply can't be connected in the graph), but I realize this is rather brute force.

Could it be better to make little sub-graphs (say something equivalent to groups on Facebook), calculate the shortest distances between them (ahead of time, perhaps) and then try to use THOSE to find a link? While this requires pre-calculation, it could make it possible to search many fewer nodes (nodes could be groups instead of individuals, making the graph much smaller). This would still be a bidirectional search though.

I could also pre-calculate the number of people an individual is connected to, searching the nodes for "popular" people first since they could have the best chance of connecting to the given destination individual. I realize this would be a trade-off of speed for possible shortest path. I'd think I'd also want to use a depth-first search instead of the breadth-first search I'd plan to use in the other cases.

Can someone think of a simpler/faster way of doing this? I'd like to be able to find the shortest length between two people, so it's not as easy as always having the same end point (such as in the Kevin Bacon problem).

I realize that there are problems like I could get chains of 200 people and such, but that can be solved my having a limit to the depth I'm willing to search.


Solution:1

This is a standard shortest path problem. There are lots of solutions, including Dijkstra's algorithm and Bellman-Ford. You may be particularly interested in looking at the A* algorithm and seeing how it would perform with the cost function relative to the inverse of any particular node's degree. The idea would be to visit more popular nodes (those with higher degree) first.


Solution:2

Sounds like a job for Dijkstra's algorithm.

ED: Eh, I shouldn't have pulled the trigger so fast. Dijkstra's (and Bellman-Ford) reduces to a breadth-first search when the weights are 1, so this isn't too useful. Oh well.

The A* algorithm, mentioned by tvanfosson, may be ideal for this. The idea is that instead of searching and recursing in whatever order the elements are in each level of the tree (rooted on your start- or end-point), you use some heuristic to determine which element you are going to try first. In your case a good bet would probably be the degree of a node (number of "friends"), but you could possibly want to use the number of people within some arbitrary number of degrees of a given person (i.e., the guy who has has three friends who each have 100 friends is likely to be a better node than the guy who has 20 friends in a clique that shuns outsiders). There's all sorts of other things you could use as a heuristic (friends get 2 points, friends-of-friends get 1 point; whatever, experiment).

Combine this with a depth limit (cut off after 6 degrees of separation, or whatever), and you can vastly improve your average case (worst case is still the same as basic BFS).


Solution:3

run a breadth-first search in both directions (from each endpoint) and stop when you have a connection or reach your depth limit


Solution:4

This one might be better overall Floyd-Warshall the all pairs shortest distance.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »