asp.NET 脏字过滤算法

日期：2020-06-13 栏目：程序人生浏览：次

原文见https://www.jb51.net/article/20575.htm
但在我这里测试的时候，RegEx要快一倍左右。但是还是不太满意，因为我们网站上脏字过滤用的相当多，对效率已经有了一些影响，经过一番思考后，自己做了一个算法。在自己的机器上测试了一下，使用原文中的脏字库，0x19c的字符串长度，1000次循环，文本查找耗时1933.47ms，RegEx用了1216.719ms，而我的算法只用了244.125ms.
更新：新增一个BitArray，用于判断某char是否在所有脏字中出现过。总时间由244ms降到了34ms.
主要算法如代码所示

复制代码代码如下:

private static Dictionary dic = new Dictionary();
private static BitArray fastcheck = new BitArray(char.MaxValue);
static void Prepare()
{
string[] badwords = // read from file
foreach (string word in badwords)
{
if (!dic.ContainsKey(word))
{
dic.Add(word, null);
maxlength = Math.Max(maxlength, word.Length);
fastcheck[word[0]] = true;
}
}
}

使用的时候

复制代码代码如下:

int index = 0;
while (index < target.Length)
{
if (!fastcheck[target[index]])
{
while (index < target.Length - 1 && !fastcheck[target[++index]]) ;
}
for (int j = 0; j < Math.Min(maxlength, target.Length - index); j++)
{
string sub = target.Substring(index, j);
if (dic.ContainsKey(sub))
{
sb.Replace(sub, "***", index, j);
index += j;
break;
}
}
index++;
}

您可能感兴趣的文章:

转载注明出处：https://www.heiqu.com/wjfygx.html

asp.NET 脏字过滤算法

相关推荐