著名的Netflix智能推荐百万美金大奖赛使用是数据集.因为竞赛关闭,Netflix官网上已无法下载.Netflixprovidedatrainingdatasetof100,480,507ratingsthat480,189usersgaveto17,770movies.Eachtrainingratingisaquadrupletoftheform.TheuserandmoviefieldsareintegerIDs,whilegradesarefrom1to5(integral)stars.[3]Thequalifyingdatasetcontainsover2,817,131tripletsoftheform,withgradesknownonlytothejury.Aparticipatingteam'salgorithmmustpredictgradesontheentirequalifyingset,buttheyareonlyinformedofthescoreforhalfofthedata,thequizsetof1,408,342ratings.Theotherhalfisthetestsetof1,408,789,andperformanceonthisisusedbythejurytodeterminepotentialprizewinners.Onlythejudgesknowwhichratingsareinthequizset,andwhichareinthetestset—thisarrangementisintendedtomakeitdifficulttohillclimbonthetestset.Submittedpredictionsarescoredagainstthetruegradesintermsofrootmeansquarederror(RMSE),andthegoalistoreducethiserrorasmuchaspossible.Notethatwhiletheactualgradesareintegersintherange1to5,submittedpredictionsneednotbe.Netflixalsoidentifiedaprobesubsetof1,408,395ratingswithinthetrainingdataset.Theprobe,quiz,andtestdatasetswerechosentohavesimilarstatisticalproperties.Insummary,thedatausedintheNetflixPrizelooksasfollows:Trainingset(99,072,112ratingsnotincludingtheprobeset,100,480,507includingtheprobeset)Probeset(1,408,395ratings)Qualifyingset(2,817,131ratings)consistingof:Testset(1,408,789ratings),usedtodeterminewinnersQuizset(1,408,342ratings),usedtocalculateleaderboardscoresForeachmovie,titleandyearofreleaseareprovidedinaseparatedataset.Noinformationatallisprovidedaboutusers.Inordertoprotecttheprivacyofcustomers,"someoftheratingdataforsomecustomersinthetrainingandqualifyin
                                    
                                    
                                        
                                            1