协同推荐算法实践之Slope One的介绍（转）

       Slope One 之一 : 简单高效的协同过滤算法(转)(
      原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am1d.html

       现在做的一个项目中需要用到推荐算法, 在网上查了一下. Beyond Search介绍了一个协同过滤算法(Collaborative Filtering) : Slope One;和其它类似算法相比, 它的最大优点在于算法很简单, 易于实现, 执行效率高, 同时推荐的准确性相对很高;

基本概念
       Slope One的基本概念很简单, 例子1, 用户X, Y和A都对Item1打了分. 同时用户X,Y还对Item2打了分, 用户A对Item2可能会打多少分呢?

User Rating to Item 1 Rating to Item 2

X 5 3

Y 4 3

A 4 ?

        根据SlopeOne算法, 应该是:4 - ((5-3) + (4-3))/2 = 2.5.
        解释一下. 用户X对Item1的rating是5, 对Item2的rating是3, 那么他可能认为Item2应该比Item1少两分. 同时用户Y认为Item2应该比Item1少1分. 据此我们知道所有对Item1和Item2都打了分的用户认为Item2会比Item1平均少1.5分. 所以我们有理由推荐用户A可能会对Item2打(4-1.5)=2.5分;

        很简单是不是? 找到对Item1和Item2都打过分的用户, 算出rating差的平均值, 这样我们就能推测出对Item1打过分的用户A对Item2的可能Rating, 并据此向A用户推荐新项目.
        这里我们能看出Slope One算法的一个很大的优点, 在只有很少的数据时候也能得到一个相对准确的推荐, 这一点可以解决Cold Start的问题.

       加权算法
       接下来我们看看加权算法(Weighted Slope One). 如果有100个用户对Item1和Item2都打过分, 有1000个用户对Item3和Item2也打过分. 显然这两个rating差的权重是不一样的. 因此我们的计算方法是
      (100*(Rating 1 to 2) + 1000(Rating 3 to 2)) / (100 + 1000)。更详细的加权算法实例：请看这里

       上面讨论的是用户只对项目的喜好程度打分.还有一种情况下用户也可以对项目的厌恶程度打分. 这时可以使用双极SlopeOne算法(BI-Polar SlopeOne). 我还在研究这篇论文,搞懂了再写吧, 呵呵;

Slope One 算法是由 Daniel Lemire 教授在 2005 年提出. 这里可以找到论文原文(PDF);上面也列出了几个参考实现. 现在有Python, Java和Erlang, 还没有C#.这篇: tutorial about how to implement Slope One in Python是一个很好的怎么实现SlopeOne并使用它来推荐的例子。

Slope One 算法 (三) ：加权平均实例
原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am69.html

例子：
首先计算item1和item2的平均差值，((5-3)+(3-4))/2=0.5，还有item1和item3的平均差值，就是5-2=3，然后推算lucy对item1的评分，根据item1和item2的平均差值来看lucy对item1的评分可能为2+0.5=2.5，同理根据item1和item3的平均差值lucy对item1的评分可能为5+3=8.
现在如何取舍那？使用加权平均数应该是一种比较好的方法:（因为2.5是根据两个值推算的，8是通过一个只推算的）
slope one 算法差不多真的就是这么简单了！
有一个开源的Java程序taste里面有一个完整的slope one算法的实现，包括程序和一个关于grouplens数据的实例程序（或者说是验证程序……）。
个人觉得slope one 很好、很强大呀！足够简单，推荐准确度也不逊色与其他复杂的推荐算法（当然，这个东西更大程度上取决与数据样本）。而且taste程序写的也很不错，稍加改造应该就可以用了。

Slope One 之二: C#实现
原文地址：http://blog.sina.com.cn/s/blog_4d9a06000100am69.html

上一篇简单介绍了Slope One算法的概念, 这次介绍C#实现
使用基于Slope One算法的推荐需要以下数据:
1. 有一组用户
2. 有一组Items(文章, 商品等)
3. 用户会对其中某些项目打分(Rating)表达他们的喜好
Slope One算法要解决的问题是, 对某个用户, 已知道他对其中一些Item的Rating了, 向他推荐一些他还没有Rating的Items, 以增加销售机会. :-)

一个推荐系统的实现包括以下三步:
1. 计算出任意两个Item之间Rating的差值
2. 输入某个用户的Rating记录, 推算出对其它Items的可能Rating值
3. 根据Rating的值排序, 给出Top Items;

第一步:例如我们有三个用户和4个Items, 用户打分的情况如下表.

Ratings User1 User2 User3

Item1 5 4 4

Item2 4 5 4

Item3 4 3 N/A

Item4 N/A 5 5

在第一步中我们的工作就是计算出Item之间两两的打分之差, 也就是使说计算出以下矩阵:

　 Item1 Item2 Item3 Item4

Item1 N/A 0/3 2/2 -2/2

Item2 0/3 N/A 2/2 -1/2

Item3 -2/2 -2/2 N/A -2/1

Item4 2/2 1/2 2/1 N/A

考虑到加权算法, 还要记录有多少人对这两项打了分(Freq), 我们先定义一个结构来保存Rating:
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get {return Value / Freq;}
        }
    }
我决定用一个Dictionary来保存这个结果矩阵:
    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return Item1Id + "/" + Item2Id;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

接下来我们来实现SlopeOne类. 首先创建一个RatingDifferenceCollection来保存矩阵, 还要创建HashSet来保持系统中总共有哪些Items:
    public class SlopeOne
    {
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection(); // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>(); // Tracking how many items totally

方法AddUserRatings接收一个用户的打分记录(Item-Rating): public void AddUserRatings(IDictionary<int, float> userRatings)
AddUserRatings中有两重循环, 外层循环遍历输入中的所有Item, 内层循环再遍历一次, 计算出一对Item之间Rating的差存入_DiffMarix, 记得Freq加1, 以记录我们又碰到这一对Items一次:
    Rating ratingDiff = _DiffMarix[item1Id, item2Id];
    ratingDiff.Value += item1Rating - item2Rating;
    ratingDiff.Freq += 1;

对每个用户调用AddUserRatings后, 建立起矩阵. 但我们的矩阵是以表的形式保存:

　 Rating Dif Freq

Item1-2 0 3

Item1-3 1 2

Item2-1 0 3

Item2-3 1 2

Item3-1 -1 2

Item3-2 -1 2

Item1-4 -1 2

Item2-4 -0.5 2

Item3-4 -2 1

Item4-1 1 2

Item4-2 0.5 2

Item4-3 2 1

第二步:输入某个用户的Rating记录, 推算出对其它Items的可能Rating值:
public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
也是两重循环, 外层循环遍历_Items中所有的Items; 内层遍历userRatings, 用此用户的ratings结合第一步得到的矩阵, 推算此用户对系统中每个项目的Rating:
    Rating itemRating = new Rating(); // Prediction of this user's rating
    ...
    Rating diff = _DiffMarix[itemId, inputItemId]:
    itemRating.Value += diff.Freq * (diff.AverageValue + userRating.Value);
    itemRating.Freq += diff.Freq;

第三步:得到用户的Rating预测后,就可以按rating排序, 向用户推荐了. 测试一下:
    Dictionary<int, float> userRating userRating = new Dictionary<int, float>();
    userRating.Add(1, 5);
    userRating.Add(3, 4);
    IDictionary<int, float> Predictions = test.Predict(userRating);
    foreach (var rating in Predictions)
    {
        Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
    }
输出:
Item 2 Rating: 5
Item 4 Rating: 6

改进:
观察之前产生的矩阵可以发现, 其中有很多浪费的空间; 例如: 对角线上永远是不会有值的. 因为我们是用线性表保存矩阵值, 已经避免了这个问题;
对角线下方的值和对角线上方的值非常对称,下方的值等于上方的值乘以-1; 在数据量很大的时候是很大的浪费. 我们可以通过修改RatingDifferenceCollection来完善. 可以修改GetKey方法, 用Item Pair来作为Key:
    private string GetKey(int Item1Id, int Item2Id) {
        return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;;
    }
完整代码在这里,在.net 3.5上调试通过;

Reference:
Tutorial about how to implement Slope One in Python
Slope One Predictors for Online Rating-Based Collaborative Filtering
Recommender Systems: Slope One

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SlopeOne
{
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get { return Value / Freq; }
        }
    }

    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

     public class SlopeOne
    {
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection();  // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>();  // Tracking how many items totally

        public void AddUserRatings(IDictionary<int, float> userRatings)
        {
            foreach (var item1 in userRatings)
            {
                int item1Id = item1.Key;
                float item1Rating = item1.Value;
                _Items.Add(item1.Key);

                foreach (var item2 in userRatings)
                {
                    if (item2.Key <= item1Id) continue; // Eliminate redundancy
                    int item2Id = item2.Key;
                    float item2Rating = item2.Value;

                    Rating ratingDiff;
                    if (_DiffMarix.Contains(item1Id, item2Id))
                    {
                        ratingDiff = _DiffMarix[item1Id, item2Id];
                    }
                    else
                    {
                        ratingDiff = new Rating();
                        _DiffMarix[item1Id, item2Id] = ratingDiff;
                    }

                    ratingDiff.Value += item1Rating - item2Rating;
                    ratingDiff.Freq += 1;
                }
            }
        }

        // Input ratings of all users
        public void AddUerRatings(IList<IDictionary<int, float>> Ratings)
        {
            foreach(var userRatings in Ratings)
            {
                AddUserRatings(userRatings);
            }
        }

        public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
        {
            Dictionary<int, float> Predictions = new Dictionary<int, float>();
            foreach (var itemId in this._Items)
            {
                if (userRatings.Keys.Contains(itemId))    continue; // User has rated this item, just skip it

                Rating itemRating = new Rating();

                foreach (var userRating in userRatings)
                {
                    if (userRating.Key == itemId) continue;
                    int inputItemId = userRating.Key;
                    if (_DiffMarix.Contains(itemId, inputItemId))
                    {
                        Rating diff = _DiffMarix[itemId, inputItemId];
                        itemRating.Value += diff.Freq * (userRating.Value + diff.AverageValue * ((itemId < inputItemId) ? 1 : -1));
                        itemRating.Freq += diff.Freq;
                    }
                }
                Predictions.Add(itemId, itemRating.AverageValue);
            }
            return Predictions;
        }

        public static void Test()
        {
            SlopeOne test = new SlopeOne();

            Dictionary<int, float> userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(2, 4);
            userRating.Add(3, 4);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 5);
            userRating.Add(3, 3);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 4);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(3, 4);

            IDictionary<int, float> Predictions = test.Predict(userRating);
            foreach (var rating in Predictions)
            {
                Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
            }
        }
    }
}

可惜啊，代码是vs2008写的，我的项目是vs2005的，改编了一下这里可以下载！

posted on 2010-07-19 17:49 漂漂阅读(10292) 评论(0) 编辑收藏引用所属分类: 算法

只有注册用户登录后才能发表评论。
【推荐】100%开源！大型工业跨平台软件C++源码提供，建模，组态！

相关文章: 在windows中编译sphinx1.10beta--coreseek(类似)(翻译) 浅谈shuffle算法--播放器的另一种随机算法协同推荐算法实践之Slope One的介绍（转）傅里叶变换和拉普拉斯变换的意义 A*寻路初探(转) 红黑树（Red-Black Tree）(转) 数学之美番外篇：快排为什么那样快(转）什么是算法，为什么需要学算法，以及算法学到什么程度（转载）

网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

常用链接

留言簿(11)

随笔分类(159)

随笔档案(224)

文章分类(2)

文章档案(4)

经典c++博客

搜索

最新评论

阅读排行榜

评论排行榜

	Item1	Item2	Item3	Item4
Item1	N/A	0/3	2/2	-2/2
Item2	0/3	N/A	2/2	-1/2
Item3	-2/2	-2/2	N/A	-2/1
Item4	2/2	1/2	2/1	N/A

	Rating Dif	Freq
Item1-2	0	3
Item1-3	1	2
Item2-1	0	3
Item2-3	1	2
Item3-1	-1	2
Item3-2	-1	2
Item1-4	-1	2
Item2-4	-0.5	2
Item3-4	-2	1
Item4-1	1	2
Item4-2	0.5	2
Item4-3	2	1