NLP入门(一)词袋模型及句子相似度 (2)

输出结果如下:

[['I', 'love', 'sky', ',', 'I', 'love', 'sea', '.'], ['I', 'like', 'running', ',', 'I', 'love', 'reading', '.']] Similarity index with 2 documents in 0 shards (stored under -Similarity-index) 利用gensim计算得到两个句子的相似度: 0.7303。

注意,如果在运行代码时出现以下warning:

gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial") gensim\matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int32 == np.dtype(int).type`. if np.issubdtype(vec.dtype, np.int):

如果想要去掉这些warning,则在导入gensim模块的代码前添加以下代码即可:

import warnings warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim') warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim')

  本文到此结束,感谢阅读!如果不当之处,请速联系笔者,欢迎大家交流!祝您好运~

注意:本人现已开通微信公众号: Python爬虫与算法(微信号为:easy_web_scrape), 欢迎大家关注哦~~

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zyzygz.html