NLP入门（一）词袋模型及句子相似度 (2)

日期：2021-06-20 栏目：程序人生浏览：次

输出结果如下：

[['I', 'love', 'sky', ',', 'I', 'love', 'sea', '.'], ['I', 'like', 'running', ',', 'I', 'love', 'reading', '.']] Similarity index with 2 documents in 0 shards (stored under -Similarity-index) 利用gensim计算得到两个句子的相似度： 0.7303。

注意，如果在运行代码时出现以下warning:

gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial") gensim\matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int32 == np.dtype(int).type`. if np.issubdtype(vec.dtype, np.int):

如果想要去掉这些warning，则在导入gensim模块的代码前添加以下代码即可：

import warnings warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim') warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim')

本文到此结束，感谢阅读！如果不当之处，请速联系笔者，欢迎大家交流！祝您好运~

注意：本人现已开通微信公众号： Python爬虫与算法（微信号为：easy_web_scrape），欢迎大家关注哦~~

转载注明出处：https://www.heiqu.com/zyzygz.html

NLP入门（一）词袋模型及句子相似度 (2)

相关推荐