讲述alphazero的原文,发表在nature。
Along-standinggoalofartificialintelligenceisanalgorithmthatlearns,tabularasa,superhumanproficiencyinchallengingdomains.Recently,AlphaGobecamethefirstprogramtodefeataworldchampioninthegameofGo.ThetreesearchinAlphaGoevaluatedpositionsandselectedmovesusingdeepneuralnetworks.Theseneuralnetworksweretrainedbysupervisedlearningfromhumanexpertmoves,andbyreinforcementlearningfromself-play.Hereweintroduceanalgorithmbasedsolelyonreinforcementlearning,withouthumandata,guidanceordomainknowledgebeyondgamerules.AlphaGobecomesitsownteacher:aneuralnetworkistrainedtopredictAlphaGo’sownmoveselectionsandalsothewinnerofAlphaGo’sgames.Thisneuralnetworkimprovesthestrengthofthetreesearch,resultinginhigherqualitymoveselectionandstrongerself-playinthenextiteration.Startingtabularasa,ournewprogramAlphaGoZeroachievedsuperhumanperformance,winning100–0againstthepreviouslypublished,champion-defeatingAlphaGo.
1