What is the fastest algorithm for ranking a set of sets in order of the number of intersections for each individual set in the array. For example, suppose you are given a set composed of N sets, from Array number 1 to Array number “N” (so the set is basically composed of multiple arrays).
Also, assume that the sets have an average size of “k”. And, for each Array in the set we want to return n-1 sets that are ranked in order of how many element that they have in common with the other arrays. What is the best way to do this, and what is the Big O time for your solution?
If you were able to read the entire question without falling asleep – congratulations! You’ve already made it quite far in solving this problem. Now, let’s come up with some solutions.
First off, we can do this:
- The first thing to do is to sort all of the individual sets. The computation cost of this activity is N * k log k. Where k is the average size of the set, and N is the number of sets in the “big” set.
- Then, do a linear comparison of the current set with all of the other sets, and at each comparison insert the set in sorted list of scores at its correct location computation cost: N * k (assuming worst case were all elements of set 1 intersect with all other elements of other sets) + N * N log k (cost to find correct location of score in score list). So, the total cost would be: N * k log k + N * k + N * k log k.
Alternative solution to the problem
Another possible solution to this problem is this:
- Take a map with each bucket a member of set under question (assuming size of m of the set, thus m buckets)
- Setup an array of size n (where each element is score of each set)
- Go through all sets, where for each set’s each member – check for bucket’s existence in map – if bucket exists there is a hit / intersection so increment the corresponding set’s score in array
computation cost: k * N - Sort this array, cost: k log k
Total cost: N * k + l log l
And there we have our 2 solutions to the problem!