Nutch2.1在Windows平台上使用Eclipse debug 存储在MySQL的(2)


步骤5:加入lib包:
      切换到Libaries选项卡,“Add Library"->"IvyDE Managed Dependencies"->"Next",选择“Project”,选择ivy\ivy.xml文件。点 Ok。eclipse会自动下载依赖的jar包。

在这个过程中或许会报错,看到错误信息是因为org.restlet.jse包下载不到。解决方法是:ivy\ivy.xml中找到
<dependency org="org.restlet.jse" rev="2.0.5" conf="*->default" />
    <dependency org="org.restlet.jse" rev="2.0.5"
      conf="*->default" />
部分,注释掉。在网上手动找到这两个包,放在lib包下,加入到Libaries中。


接着加入plugin文件夹下各个插件的ivy.xml文件。手动一个一个加进去。


步骤6:在"Order and Export"选项卡,将 conf    top
步骤7:数据库配置以及其他配置信息
    打开/conf/gora.properties ,删除文件中所有内容,写入mysql配置:
    ###############################
# MySQL properties            #
###############################
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true
gora.sqlstore.jdbc.user=root
gora.sqlstore.jdbc.password=123456


    在/conf/gora-sql-mapping.xml 修改  <primarykey column="id" length="240"/>
    在 /conf/nutch-site.xml输入:
<property>
<name>http.agent.name</name>
<value>Your Nutch Spider</value>
</property>


<property>
<name>http.accept.language</name>
<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>


<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other information
is available</description>
</property>


<property>
  <name>plugin.includes</name>
 <value>protocol-httpclient|protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
 <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable
  protocol-httpclient, but be aware of possible intermittent problems with the
  underlying commons-httpclient library.
  </description>
</property>


<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>The Gora DataStore class for storing and retrieving data.
Currently the following stores are available: ….
</description>
</property>


<property>
  <name>plugin.folders</name>
  <value>./src/plugin</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/e20ec342ccf3d9e8d07552cab1542be3.html