主要责任者Szepesvári,Csaba.题名Algorithmsforreinforcementlearning[electronicresource]/CsabaSzepesvári.出版资料SanRafael,Calif.(1537FourthStreet,SanRafael,CA94901USA):Morgan&Claypool,c2010.摘要附注Reinforcementlearningisalearningparadigmconcernedwithlearningtocontrolasystemsoastomaximizeanumericalperformancemeasurethatexpressesalong-termobjective.Whatdistinguishesreinforcementlearningfromsupervisedlearningisthatonlypartialfeedbackisgiventothelearneraboutthelearner'spredictions.Further,thepredictionsmayhavelongtermeffectsthroughinfluencingthefuturestateofthecontrolledsystem.Thus,timeplaysaspecialrole.Thegoalinreinforcementlearningistodevelopefficientlearningalgorithms,aswellastounderstandthealgorithms'meritsandlimitations.Reinforcementlearningisofgreatinterestbecauseofthelargenumberofpracticalapplicationsthatitcanbeusedtoaddress,rangingfromproblemsinartificialintelligencetooperationsresearchorcontrolengineering.Inthisbook,wefocusonthosealgorithmsofreinforcementlearningthatbuildonthepowerfultheoryofdynamicprogramming.Wegiveafairlycomprehensivecatalogoflearningproblems,describethecoreideas,notealargenumberofstateoftheartalgorithms,followedbythediscussionoftheirtheoreticalpropertiesandlimitations.
                                    
                                    
                                         2025/10/5 9:16:40 
                                             1.71MB 
                                            强化学习