步骤5:加入lib包:
切换到Libaries选项卡,“Add Library"->"IvyDE Managed Dependencies"->"Next",选择“Project”,选择ivy\ivy.xml文件。点 Ok。eclipse会自动下载依赖的jar包。
在这个过程中或许会报错,看到错误信息是因为org.restlet.jse包下载不到。解决方法是:ivy\ivy.xml中找到
<dependency org="org.restlet.jse" rev="2.0.5" conf="*->default" />
<dependency org="org.restlet.jse" rev="2.0.5"
conf="*->default" />
部分,注释掉。在网上手动找到这两个包,放在lib包下,加入到Libaries中。
接着加入plugin文件夹下各个插件的ivy.xml文件。手动一个一个加进去。
步骤6:在"Order and Export"选项卡,将 conf top
步骤7:数据库配置以及其他配置信息
打开/conf/gora.properties ,删除文件中所有内容,写入mysql配置:
###############################
# MySQL properties #
###############################
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=root
gora.sqlstore.jdbc.password=123456
在/conf/gora-sql-mapping.xml 修改 <primarykey column="id" length="240"/>
在 /conf/nutch-site.xml输入:
<property>
<name>http.agent.name</name>
<value>Your Nutch Spider</value>
</property>
<property>
<name>http.accept.language</name>
<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other information
is available</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with the
underlying commons-httpclient library.
</description>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ….
</description>
</property>
<property>
<name>plugin.folders</name>
<value>./src/plugin</value>
<description>Directories where nutch plugins are located. Each
element may be a relative or absolute path. If absolute, it is used
as is. If relative, it is searched for on the classpath.</description>
</property>