zz from : http://arstechnica.com/science/news/2010/02/recommendation-algorithm-wants-to-show-you-something-new.ars
When it comes to recommendation systems, everybody’s looking to increase accuracy: the Netflix Prize was awarded last July for an algorithm that improved the accuracy of the service’s recommendation algorithm by 10 percent. However, computer scientists are finding a new metric to improve upon: recommendation diversity. In a paper that will be released by PNAS, a group of scientists are pushing the limits of recommendation systems, creating new algorithms that will make more tangential recommendations to users, which can help expand their interests, which will increase the longevity and utility of the recommendation system itself.
Accuracy has long been the most prized measurement in recommending content, like movies, links, or music. However, computer scientists note that this type of system can narrow the field of interest for each user the more it is used. Improved accuracy can result in a strong filtering based on a user’s interests, until the system can only recommend a small subset of all the content it has to offer.
The authors of the paper also note that accurate recommendations are not always useful. For example, suggesting one generic romantic comedy after another (say What Happens in Vegas and Just Married) just because a user rated When Harry Met Sally five stars is not helpful. Systems that base recommendations on correlations between users can miss niche items that a user might like, but would never find on his own. Research indicates that the most interesting recommendations and information originate from “weak ties” in a system, between users that are somewhat similar but disparate enough that they can introduce novelty to each other.
To widen the potential field of user interest, the authors developed a hybrid of two algorithms. One combined an algorithm that based its recommendations on random walks between highly connected users and material; the other mirrored the process of heat diffusion, spreading ratings at a decreasing level of potency as the recommendation had to travel further. The heat diffusion algorithm can be thought of as a system that has users connected in a network with the objects they have interacted with and evaluated, and values are passed among the items in this network to develop ratings.
The head diffusion model uses values of 1 or 0 for the material to be recommended—either a user liked something or he didn’t—and takes an average of the total resources a user had assigned to an object to give the user a value. For example, if a user liked two things and disliked two others, the value assigned to the user would be one-half.
The algorithm then averaged these values for any users connected to an object, and this became the object’s value in the system (for example, if two users were attached to an object and one had a value of one-half and the other had zero, the new value assigned to the object would be one quarter). All of this can be done using a small set of data, meaning the heat diffusion algorithm can make diverse yet relevant recommendations based onsparse data in one pass.
To test the algorithms individually and in hybrid form, scientists used data sets from Netflix, Rate Your Music and del.ici.ous, reducing ratings of various numbers of stars to likes or dislikes (three stars out of five and six out of ten qualified as a “like” in Netflix and RYM, respectively). They removed 10 percent of the selections from the data sets, and then applied the algorithms to test how much of the deleted data they could recover, as well as how many new and relevant selections the algorithms could make.
Combining the heat diffusion approach with the safer and more accurate random walk, the researchers found that they could create a body of recommendations that combined novelty items and safer, more accurate pieces. More importantly, using both allowed for more accurate recommendations than using either alone.
The hybrid took the form of a linear combination of the random walk and the heat diffusion algorithms, and the influence of each could be tuned by adjusting their coefficients to create more novelty or more accuracy as needed. This might allow for a system where a user could adjust the recommendations according to how interested they are in seeing something that may be outside of their normal content. The authors also noted that adding a global ranking algorithm that recommends items based on overall popularity could improve accuracy when little is known about the user.
While the accuracy of recommendations has been the prized focus (literally) in these systems, diversity and novelty are prized measures too (think of all those friends who boast about liking bands or movies before they were popular). The algorithms are still largely experimental, and the authors note that there is a significantly higher computational cost associated with using a hybrid algorithm. Nonetheless, diversity of suggestions seems to be the next horizon in refining recommendation systems.
PNAS, 2010. DOI: 10.1073/pnas.1000488107