输出结果如下:
[['I', 'love', 'sky', ',', 'I', 'love', 'sea', '.'], ['I', 'like', 'running', ',', 'I', 'love', 'reading', '.']] Similarity index with 2 documents in 0 shards (stored under -Similarity-index) 利用gensim计算得到两个句子的相似度: 0.7303。注意,如果在运行代码时出现以下warning:
gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial") gensim\matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int32 == np.dtype(int).type`. if np.issubdtype(vec.dtype, np.int):如果想要去掉这些warning,则在导入gensim模块的代码前添加以下代码即可:
import warnings warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim') warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim')本文到此结束,感谢阅读!如果不当之处,请速联系笔者,欢迎大家交流!祝您好运~
注意:本人现已开通微信公众号: Python爬虫与算法(微信号为:easy_web_scrape), 欢迎大家关注哦~~