spark01: Warning: Permanently added 'spark01,192.168.244.147' (ECDSA) to the list of known hosts. spark@spark01's password: spark01: starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-spark01.out
输入spark01上spark用户的密码
可通过日志的信息来确认workder是否正常启动,因信息太多,在这里就不贴出了。
[spark@spark01 spark-1.4.0-bin-hadoop2.6]$ cd logs/
[spark@spark01 logs]$ cat spark-spark-org.apache.spark.deploy.worker.Worker-1-spark01.out
启动spark shell
[spark@spark01 spark-1.4.0-bin-hadoop2.6]$ bin/spark-shell --master spark://spark01:7077
16/01/16 15:33:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/01/16 15:33:18 INFO spark.SecurityManager: Changing view acls to: spark 16/01/16 15:33:18 INFO spark.SecurityManager: Changing modify acls to: spark 16/01/16 15:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 16/01/16 15:33:18 INFO spark.HttpServer: Starting HTTP Server 16/01/16 15:33:18 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/01/16 15:33:18 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:42300 16/01/16 15:33:18 INFO util.Utils: Successfully started service 'HTTP class server' on port 42300. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79) Type in expressions to have them evaluated. Type :help for more information. 16/01/16 15:33:30 INFO spark.SparkContext: Running Spark version 1.4.0 16/01/16 15:33:30 INFO spark.SecurityManager: Changing view acls to: spark 16/01/16 15:33:30 INFO spark.SecurityManager: Changing modify acls to: spark 16/01/16 15:33:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 16/01/16 15:33:31 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/01/16 15:33:31 INFO Remoting: Starting remoting 16/01/16 15:33:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.244.147:43850] 16/01/16 15:33:31 INFO util.Utils: Successfully started service 'sparkDriver' on port 43850. 16/01/16 15:33:31 INFO spark.SparkEnv: Registering MapOutputTracker 16/01/16 15:33:31 INFO spark.SparkEnv: Registering BlockManagerMaster 16/01/16 15:33:31 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-7b7bd4bd-ff20-4e3d-a354-61a4ca7c4b2f/blockmgr-0e855210-3609-4204-b5e3-151e0c096c15 16/01/16 15:33:31 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB 16/01/16 15:33:31 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-7b7bd4bd-ff20-4e3d-a354-61a4ca7c4b2f/httpd-56ac16d2-dd82-41cb-99d7-4d11ef36b42e 16/01/16 15:33:31 INFO spark.HttpServer: Starting HTTP Server 16/01/16 15:33:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/01/16 15:33:31 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:47633 16/01/16 15:33:31 INFO util.Utils: Successfully started service 'HTTP file server' on port 47633. 16/01/16 15:33:31 INFO spark.SparkEnv: Registering OutputCommitCoordinator 16/01/16 15:33:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/01/16 15:33:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/01/16 15:33:31 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/01/16 15:33:31 INFO ui.SparkUI: Started SparkUI at :4040 16/01/16 15:33:32 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark01:7077/user/Master... 16/01/16 15:33:33 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160116153332-0000 16/01/16 15:33:33 INFO client.AppClient$ClientActor: Executor added: app-20160116153332-0000/0 on worker-20160116152314-192.168.244.147-58914 (192.168.244.147:58914) with 2 cores 16/01/16 15:33:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160116153332-0000/0 on hostPort 192.168.244.147:58914 with 2 cores, 512.0 MB RAM 16/01/16 15:33:33 INFO client.AppClient$ClientActor: Executor updated: app-20160116153332-0000/0 is now LOADING 16/01/16 15:33:33 INFO client.AppClient$ClientActor: Executor updated: app-20160116153332-0000/0 is now RUNNING 16/01/16 15:33:34 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33146. 16/01/16 15:33:34 INFO netty.NettyBlockTransferService: Server created on 33146 16/01/16 15:33:34 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/01/16 15:33:34 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.244.147:33146 with 265.4 MB RAM, BlockManagerId(driver, 192.168.244.147, 33146) 16/01/16 15:33:34 INFO storage.BlockManagerMaster: Registered BlockManager 16/01/16 15:33:34 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/01/16 15:33:34 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. 16/01/16 15:33:38 INFO hive.HiveContext: Initializing execution hive, version 0.13.1 16/01/16 15:33:43 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 16/01/16 15:33:43 INFO metastore.ObjectStore: ObjectStore, initialize called 16/01/16 15:33:44 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 16/01/16 15:33:44 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 16/01/16 15:33:44 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.244.147:46741/user/Executor#-2043358626]) with ID 0 16/01/16 15:33:44 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/01/16 15:33:45 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.244.147:33017 with 265.4 MB RAM, BlockManagerId(0, 192.168.244.147, 33017) 16/01/16 15:33:46 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/01/16 15:33:48 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 16/01/16 15:33:48 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". 16/01/16 15:33:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 16/01/16 15:33:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 16/01/16 15:33:54 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 16/01/16 15:33:54 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 16/01/16 15:33:54 INFO metastore.ObjectStore: Initialized ObjectStore 16/01/16 15:33:54 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa 16/01/16 15:33:55 INFO metastore.HiveMetaStore: Added admin role in metastore 16/01/16 15:33:55 INFO metastore.HiveMetaStore: Added public role in metastore 16/01/16 15:33:56 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 16/01/16 15:33:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 16/01/16 15:33:56 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala>
打开spark shell以后,可以写一个简单的程序,say hello to the world
scala> println("helloworld") helloworld
再来看看spark的web管理界面,可以看出,多了一个Workders和Running Applications的信息
至此,Spark的伪分布式环境搭建完毕,
有以下几点需要注意:
1. 上述中的Maven和SBT是非必须的,只是为了后续的源码编译,所以,如果只是单纯的搭建Spark环境,可不用下载Maven和SBT。
2. 该Spark的伪分布式环境其实是集群的基础,只需修改极少的地方,然后copy到slave节点上即可,鉴于篇幅有限,后文再表。
更多Spark相关教程见以下内容: