一、正则表达式的定义
正则表达式又称正规表达式、常规表达式。在代码中常简写为regex、regexp或RE。正则表达式是使用单个字符串来描述,匹配一系列符合某个句法规则的字符串,简单来说,是一种匹配字符串的方法,通过一些特殊符号,实现快速查找、删除、替换某个特定字符串。
正则表达式是由普通字符与元字符组成的文字模式。模式用于描述在搜索文本时要匹配的一个或多个字符串。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进行匹配。其中普通字符包括大小写字母、数字、标点符号及一些其他符号,元字符则是指那些在正则表达式中具有特殊意义的专用字符,可以用来规定其前导字符(即位于元字符前面的字符)在目标对象中的出现模式。
正则表达式的字符串表达方法根据不同的严谨程度与功能分为基本正则表达式与扩展正则表达式。基础正则表达式是常用的正则表达式的最基础的部分。在Linux系统中常见的文件处理工具中grep与sed支持基础正则表达式,而egrep与awk支持扩展正则表达式。
提前准备一个名为test.txt的测试文件,文件具体内容如下:
[root@CentOS01 ~]# vim test.txt he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. The year ahead will test our political establishment to the limit. PI=3.14148223023840-2382924893980--2383892948 a wood cross! Actions speak louder than words #wooood # #woooood # AxyzxyzxyzxyzxyzC I bet this place is really spooky late at night! Misfortunes never come alone/single. I shouldn't have lett so tast. 1)基础正则表达式示例: [root@centos01 ~]# grep -n 'the' test.txt <!--查找特定字符,-n显示行号--> 4:the tongue is boneless but it breaks bones.12! 5:google is the best tools for search keyword. 6:The year ahead will test our political establishment to the limit. [root@centos01 ~]# grep -in 'the' test.txt <!--查找特定字符,-in显示行号不区分大小写--> 3:The home of Football on BBC Sport online. 4:the tongue is boneless but it breaks bones.12! 5:google is the best tools for search keyword. 6:The year ahead will test our political establishment to the limit. [root@centos01 ~]# grep -vn 'the' test.txt <!--查找不包括特定字符的行,-vn选项实现--> 1:he was short and fat. 2:He was wearing a blue polo shirt with black pants. 3:The home of Football on BBC Sport online. 7:PI=3.14148223023840-2382924893980--2383892948 8:a wood cross! 9:Actions speak louder than words 10: 11: 12:#wooood # 13:#woooood # 14:AxyzxyzxyzxyzxyzC 15:I bet this place is really spooky late at night! 16:Misfortunes never come alone/single. 17:I shouldn't have lett so tast. 2)grep利用中括号“[]”来查找集合字符 [root@centos01 ~]# grep -n 'sh[io]rt' test.txt <!--中括号来查找集合字符, “[]”中无论有几个字符,都仅代表一个字符, 也就是说“[io]”表示匹配“i”或者“o”--> 1:he was short and fat. 2:He was wearing a blue polo shirt with black pants. [root@centos01 ~]# grep -n 'oo' test.txt <!--查找重复单个字符--> 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 8:a wood cross! 12:#wooood # 13:#woooood # 15:I bet this place is really spooky late at night! [root@centos01 ~]# grep -n '[^w]oo' test.txt <!--查找“oo”前面不是“w”的字符串, 使用“[^]”选项实现--> 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 12:#wooood # 13:#woooood # 15:I bet this place is really spooky late at night! [root@centos01 ~]# grep -n '[^a-z]oo' test.txt <!--查找“oo”前面不存在小写字母--> 3:The home of Football on BBC Sport online. [root@centos01 ~]# grep -n '[0-9]' test.txt <!--查找包含数字的行--> 4:the tongue is boneless but it breaks bones.12! 7:PI=3.14148223023840-2382924893980--2383892948 3)grep查找行首“^”与行尾字符“$” [root@centos01 ~]# grep -n '^the' test.txt <!--查找以“the”字符串为行首的行--> 4:the tongue is boneless but it breaks bones.12! [root@centos01 ~]# grep -n '^[a-z]' test.txt <!--查找以小写字母为行首的行 --> 1:he was short and fat. 4:the tongue is boneless but it breaks bones.12! 5:google is the best tools for search keyword. 8:a wood cross! [root@centos01 ~]# grep -n '^[A-Z]' test.txt <!--查找以大写字母为行首的行--> 2:He was wearing a blue polo shirt with black pants. 3:The home of Football on BBC Sport online. 6:The year ahead will test our political establishment to the limit. 7:PI=3.14148223023840-2382924893980--2383892948 9:Actions speak louder than words 14:AxyzxyzxyzxyzxyzC 15:I bet this place is really spooky late at night! 16:Misfortunes never come alone/single. 17:I shouldn't have lett so tast. [root@centos01 ~]# grep -n '^[^a-zA-Z]' test.txt <!--查找不以字母开头的行--> 12:#wooood # 13:#woooood # [root@centos01 ~]# grep -n 'w..d' test.txt <!--查找任意一个字符“.”与重复字符“*”--> 5:google is the best tools for search keyword. 8:a wood cross! 9:Actions speak louder than words [root@centos01 ~]# grep -n 'ooo*' test.txt <!--查看包含至少两个o以上的字符串--> 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 8:a wood cross! 11:#woood # 13:#woooooood # 19:I bet this place is really spooky late at night! [root@centos01 ~]# grep -n 'woo*d' test.txt <!--查询w开头d结尾,中间至少包含一个o的字符串--> 8:a wood cross! 11:#woood # 13:#woooooood # [root@centos01 ~]# grep -n '[0-9][0-9]*' test.txt <!--查询任意数字所在行--> 4:the tongue is boneless but it breaks bones.12! 7:PI=3.141592653589793238462643383249901429 [root@centos01 ~]# grep -n 'o\{2\}' test.txt <!--查找连续两个o的字符“{}”--> 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 8:a wood cross! 11:#woood # 13:#woooooood # 19:I bet this place is really spooky late at night! 2、元字符总结<img " src="https://s1.51cto.com/images/blog/201911/09/07a1db13ccef928a82d18582046e1a41.png" alt="Shell脚本中的正则表达式" />
二、扩展正则表达式元字符<img " src="https://s1.51cto.com/images/blog/201911/09/6a07d0597ea832b49889593b878078a2.png" alt="Shell脚本中的正则表达式" />
三、文本处理器