上传者: weixin_43960172
|
上传时间:2025/3/16 20:38:20
|
文件大小:19.91MB
|
文件类型:pdf
DataAnalyticswithSparkUsingPython
DataAnalyticswithSparkUsingPython(Addison-WesleyData&AnalyticsSeries)By作者:JeffreyAvenISBN-10书号:013484601XISBN-13书号:9780134846019Edition版本:1出版日期:2018-06-16pages页数:851SolveDataAnalyticsProblemswithSpark,PySpark,andRelatedOpenSourceToolsSparkisattheheartoftoday’sBigDatarevolution,helpingdataprofessionalssuperchargeefficiencyandperformanceinawiderangeofdataprocessingandanalyticstasks.Inthisguide,BigDataexpertJeffreyAvencoversallyouneedtoknowtoleverageSpark,togetherwithitsextensions,subprojects,andwiderecosystem.Avencombinesalanguage-agnosticintroductiontofoundationalSparkconceptswithextensiveprogrammingexamplesutilizingthepopularandintuitivePySparkdevelopmentenvironment.Thisguide’sfocusonPythonmakesitwidelyaccessibletolargeaudiencesofdataprofessionals,analysts,anddevelopers—eventhosewithlittleHadooporSparkexperience.Aven’sbroadcoveragerangesfrombasictoadvancedSparkprogramming,andSparkSQLtomachinelearning.You’lllearnhowtoefficientlymanageallformsofdatawithSpark:streaming,structured,semi-structured,andunstructured.Throughout,concisetopicoverviewsquicklygetyouuptospeed,andextensivehands-onexercisesprepareyoutosolverealproblems.Coverageincludes:UnderstandSpark’sevolvingroleintheBigDataandHadoopecosystemsCreateSparkclustersusingvariousdeploymentmodesControlandoptimizetheoperationofSparkclustersandapplicationsMasterSparkCoreRDDAPIprogrammingtechniquesExtend,accelerate,andoptimizeSparkroutineswithadvancedAPIplatformconstructs,includingsharedvariables,RDDstorage,andpartitioningEfficientlyintegrateSparkwithbothSQLandnonrelationaldatastoresPerformstreamprocessingandmessagingwithSparkStreamingandApacheKafkaImplementpredictivemodelingwithSparkRandSparkMLlibI:SparkFoundations1IntroducingBigData,Hadoop,an
本软件ID:10902063