Ubuntu 13.10下配置Nutch1.7和Solr4.6集成(2)

CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-03-03 08:55:30, elapsed: 00:00:01
LinkDb: starting at 2014-03-03 08:55:30
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: internal links will be ignored.
LinkDb: adding segment: file:/opt/nutch/crawl/segments/20140303085430
LinkDb: adding segment: file:/opt/nutch/crawl/segments/20140303085441
LinkDb: finished at 2014-03-03 08:55:31, elapsed: 00:00:01
Indexer: starting at 2014-03-03 08:55:31
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
    solr.server.url : URL of the SOLR instance (mandatory)
    solr.commit.size : buffer size when sending to SOLR (default 1000)
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
    solr.auth : use authentication (default false)
    solr.auth.username : use authentication (default false)
    solr.auth : username for authentication
    solr.auth.password : password for authentication


Indexer: finished at 2014-03-03 08:55:35, elapsed: 00:00:03
SolrDeleteDuplicates: starting at 2014-03-03 08:55:35
SolrDeleteDuplicates: Solr url: :8983/solr/
SolrDeleteDuplicates: finished at 2014-03-03 08:55:36, elapsed: 00:00:01
crawl finished: crawl
检索抓取到的内容,用浏览器打开 :8983/solr/#/collection1/query ,点击Excute Query即可。

Nutch的详细介绍请点这里
Nutch的下载地址请点这里

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/e6b3bba6f84de6947fc6de71b3f756c2.html