最近突然对spark的spark-shell发生了兴趣
它是如何启动scala的REPL的,并且在此前写入了常用的环境变量的呢?
通过查看spark的源码,找到了SparkILoop.scala
import scala.tools.nsc.interpreter.{JPrintWriter, ILoop}
/**
* A Spark-specific interactive shell.
*/
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter)
extends ILoop(in0, out) {
def this(in0: BufferedReader, out: JPrintWriter) = this(Some(in0), out)
def this() = this(None, new JPrintWriter(Console.out, true))
def initializeSpark() {
intp.beQuietDuring {
processLine("""
@transient val sc = {
val _sc = org.apache.spark.repl.Main.createSparkContext()
println("Spark context available as sc.")
_sc
}
""")
processLine("""
@transient val sqlContext = {
val _sqlContext = org.apache.spark.repl.Main.createSQLContext()
println("SQL context available as sqlContext.")
_sqlContext
}
""")
processLine("import org.apache.spark.SparkContext._")
processLine("import sqlContext.implicits._")
processLine("import sqlContext.sql")
processLine("import org.apache.spark.sql.functions._")
}
}
...
}
可以看出SparkILoop继承自scala.tools.nsc.interpreter.ILoop
紧接着着看了ILoop的api doc
终于找到了启动ILoop的方法:
import scala.tools.nsc.interpreter.ILoop
import scala.tools.nsc.Settings
val loop = new ILoop
loop.process(new Settings)
使用IntelliJ IDEA编写Scala在Spark中运行
Scala编程完整中文版 PDF
Ubuntu 安装 2.10.x版本的Scala
CentOS 6.2(64位)下安装Spark0.8.0详细记录
Hadoop vs Spark性能对比