I-team 博客全文检索 Elasticsearch 实战

日期：2021-05-08 栏目：程序人生浏览：次

一直觉得博客缺点东西，最近还是发现了，当博客慢慢多起来的时候想要找一篇之前写的博客很是麻烦，于是作为后端开发的楼主觉得自己动手丰衣足食，也就有了这次博客全文检索功能Elasticsearch实战，这里还要感谢一下‘辉哥’赞助的一台服务器。

全文检索工具选型

众所周知，支持全文检索的工具有很多，像 Lucene，solr， Elasticsearch 等，相比于其他的工具，显然 Elasticsearch 社区更加活跃，遇到问题相对来说也比较好解决，另外 Elasticsearch 提供的restful接口操作起来还是比较方便的，这也是楼主选择 Elasticsearch 的重要原因，当然 Elasticsearch 占据的内存相对来说比较大一点，楼主2G的云服务器跑起来也是捉襟见肘。

数据迁移，从 MySQL 到 Elasticsearch

这个功能相对来说比较简单，就是定时从 MySQL 更新数据到 Elasticsearch 中，本来楼主打算自己写一个数据迁移的工具，但是想起之前楼主做数据迁移时用到的DataX很是不错，看了写官方文档还是支持的，但是楼主硬是没有跑起来，原因就是楼主2G内存的云服务器不够使啊，DataX光是跑起来就要1G多的内存，所以楼主只能另谋它法。对DataX感兴趣的小伙伴可以看看楼主的另一篇文章阿里离线数据同步工具 DataX 踩坑记录。

说起可以省内存的语言，小伙伴可能会想到最近比较火的golang，没错楼主也想到了。最后楼主使用的就是一个叫go-mysql-elasticsearch的工具，就是使用golang实现的从 MySQL 将数据迁移到 Elasticsearch 的工具。具体搭建过程楼主不在这里细说，感兴趣的小伙伴请移步go-mysql-elasticsearch，另外 Elasticsearch 环境的搭建，需要注意的就是安装 Elasticsearch 的机器内存应该大于或者等于2G，否则可能会出现起不起来的情况，楼主也不在这里赘述了，比较简单，请小伙伴们自行google。

另外需要注意的是，在使用 go-mysql-elasticsearch 的时候应该开启mysql的binlog功能，go-mysql-elasticsearch的实现同步数据的思想就是将自己作为MySQL的一个slave挂载在MySQL上，这样就可以很轻松的将数据实时同步到 Elasticsearch 中，在启动 go-mysql-elasticsearch 的机器上最少应该有MySQL client工具，否则会启动报错。楼主的建议是根MySQL部署在同一台机器上，因为golang耗费内存极少，并不会有太大影响。下面给出楼主同步数据时 go-mysql-elasticsearch 的配置文件：

# MySQL address, user and password # user must have replication privilege in MySQL. my_addr = "127.0.0.1:3306" my_user = "root" my_pass = "******" my_charset = "utf8" # Set true when elasticsearch use https #es_https = false # Elasticsearch address es_addr = "127.0.0.1:9200" # Elasticsearch user and password, maybe set by shield, nginx, or x-pack es_user = "" es_pass = "" # Path to store data, like master.info, if not set or empty, # we must use this to support breakpoint resume syncing. # TODO: support other storage, like etcd. data_dir = "./var" # Inner Http status address stat_addr = "127.0.0.1:12800" # pseudo server id like a slave server_id = 1001 # mysql or mariadb flavor = "mysql" # mysqldump execution path # if not set or empty, ignore mysqldump. mysqldump = "mysqldump" # if we have no privilege to use mysqldump with --master-data, # we must skip it. #skip_master_data = false # minimal items to be inserted in one bulk bulk_size = 128 # force flush the pending requests if we don't have enough items >= bulk_size flush_bulk_time = "200ms" # Ignore table without primary key skip_no_pk_table = false # MySQL data source [[source]] schema = "billboard-blog" # Only below tables will be synced into Elasticsearch. tables = ["content"] # Below is for special rule mapping [[rule]] schema = "billboard-blog" table = "content" index = "contentindex" type = "content" [rule.field] title="title" blog_desc="blog_desc" content="content" # Filter rule [[rule]] schema = "billboard-blog" table = "content" index = "contentindex" type = "content" # Only sync following columns filter = ["title", "blog_desc", "content"] # id rule [[rule]] schema = "billboard-blog" table = "content" index = "contentindex" type = "content" id = ["id"] 实现全文检索功能的服务

要想实现全文检索的功能并对外提供服务，web服务必不可少，楼主使用Spring Boot搭建web服务，对Spring Boot感兴趣的小伙伴也可以看一下楼主的另一篇文章，使用Spring Boot实现博客统计服务。好了废话不多说了，请看代码

接口实现代码，代码比较简单就是接收参数，调用service代码

@ApiOperation(value="全文检索接口", notes="") @ApiImplicitParam(name = "searchParam", value = "博客搜索条件（作者，描述，内容，标题）", required = true, dataType = "String") @RequestMapping(value = "/get_content_list_from_es", method = RequestMethod.GET) public ResultCode<List<ContentsWithBLOBs>> getContentListFromEs(String searchParam) { ResultCode<List<ContentsWithBLOBs>> resultCode = new ResultCode(); try { LOGGER.info(">>>>>> method getContentListFromEs request params : {},{}，{}",searchParam); resultCode = contentService.getContentListFromEs(searchParam); LOGGER.info(">>>>>> method getContentListFromEs return value : {}",JSON.toJSONString(resultCode)); } catch (Exception e) { e.printStackTrace(); resultCode.setCode(Messages.API_ERROR_CODE); resultCode.setMsg(Messages.API_ERROR_MSG); } return resultCode; }

转载注明出处：https://www.heiqu.com/wspygx.html

I-team 博客全文检索 Elasticsearch 实战

相关推荐