Java中的String.hashCode()方法可能有问题？(2)

日期：2020-06-11 栏目：程序人生浏览：次

System.out.println("Elapsed time: "+elapsed+"ns");
System.out.println("Total unique lines: "+lines.size());
System.out.println("Time per hashcode: "+String.format("%.4f", 1.0*elapsed/lines.size())+"ns");
System.out.println("Total unique hashcodes: "+collisions.size());
System.out.println("Total collisions: "+(lines.size()-collisions.size()));
System.out.println("Collision rate: "+String.format("%.8f", 1.0*(lines.size()-collisions.size())/lines.size()));
if(maxsize != 0)
System.out.println("Max collisions: "+maxsize+" "+collisions.get(maxhc));
}
}

我们使用短字符串（words.txt，链接见文末）作为输入：

$ cat words.txt | java HashTest
Loading lines from stdin...
Warming up..........
Computing collisions...
Elapsed time: 49117411ns
Total unique lines: 466544
Time per hashcode: 105.2793ns
Total unique hashcodes: 466188
Total collisions: 356
Collision rate: 0.00076306
Max collisions: 3 [Jr, KS, L4]

在这些英文短字符串中，总共有466,544个哈希，出现356次冲突。从理论上讲，“公平”的哈希函数应该只会产生25.33次冲突。因此，String.hashCode()产生的冲突是公平哈希函数的14.05倍：
356.0 / 25.33 ≈ 14.05

不过，每10,000个哈希出现8次冲突的概率仍然是个不错的成绩。

那么长字符串值的结果怎样？使用莎士比亚全集中的句子（链接见文末）会产生以下输出：
$ cat shakespeare.txt | java HashTest
Loading lines from stdin...
Warming up..........
Computing collisions...
Elapsed time: 24106163ns
Total unique lines: 111385
Time per hashcode: 216.4220ns
Total unique hashcodes: 111384
Total collisions: 1
Collision rate: 0.00000897
0.00076306
Max collisions: 2 [ There's half a dozen sweets., PISANIO. He hath been search'd among the dead and living,]

在这些较长的英语字符串中，总共有111,385个哈希，出现1次冲突。“公平”哈希函数将在这些数据上产生1.44次冲突，因此String.hashCode()优于公平哈希函数，冲突可能性是公平哈希函数的69.4％：
1 / 1.44 ≈ 0.694

也就是说，每100,000个哈希产生不到1个冲突，这个成绩是极好的。

一点解释

显然，String.hashCode()不具备唯一性，它也不可能具备唯一性。对于短字符串，它与理论平均值差得比较远，但其实做得还算不错。对于长字符串，它可以轻松打败平均理论值。

总得来看，它对于预期字符串而言是具备唯一性的，可以将字符串很好地分布在哈希表中。

最后，我还是认为String.hashCode()是具备唯一性的，至少它足够“好”。

延伸阅读

如果你对这个问题感兴趣，我强烈建议你看一看Stack Overflow上的答案（），它深入探讨了哈希函数冲突的问题。

重要链接：

查看英文原文：

Linux公社的RSS地址：https://www.linuxidc.com/rssFeed.aspx

转载注明出处：https://www.heiqu.com/cd9bfb54e0b2db68ce30853782a9114b.html

Java中的String.hashCode()方法可能有问题？(2)

相关推荐