虽然sed已经很牛逼了,但是再牛逼也有自身的限制。gawk就是用来搞定sed不能搞定的问题。
gawk可以做以下几件事情:
定义变量来保存数据;
使用算术和字符串操作符来处理数据;
使用结构化编程概念,为数据处理增加逻辑;
提取数据文件中的数据元素进行格式化。
gawk命令格式:
gawk options program file1
还是直接看例子吧。
列操作
首先假设我们有这样一个文本数据data:
No.1 Google Gmail
No.2 Microsoft Windows
No.3 SAP ERP
No.4 Intel Core
No.5 Cisco Rout
输入命令并得出输出结果:
$ gawk '{print $1}' data
No.1
No.2
No.3
No.4
No.5
看到了吧,啥情况,这是输出文本所有以空格为分隔符的第一列。输出第二列当然就是这样写了。
$ gawk '{print $2}' data
Google
Microsoft
SAP
Intel
Cisco
如果命令是‘$0’,会出现啥?
$ gawk '{print $0}' data
No.1 Google Gmail
No.2 Microsoft Windows
No.3 SAP ERP
No.4 Intel Core
No.5 Cisco Router
很显然,如果输入‘$0’,那整个文本就输出了。
好,有了这个功能,那我们分析一些系统文件是不是就省事多了,比如passwd文件里面的内容打开以后,如果直接看,那貌似比较困难,现在有了gwak,来先输出一下第一列吧。
$ gawk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
systemd-timesync
systemd-network
systemd-resolve
systemd-bus-proxy
syslog
......
看到这个结果就爽,不过要注意:这里用了一个参数F,这个参数的意思就是指定行中分隔数据字段的字段分隔符。F后面紧跟了个‘:’,意思就是以‘:’为分隔符。
gawk当然也可以用于管道数据处理:
$ echo "I have a pen" | gawk '{$4="apple"; print $0}'
I have a apple
这个命令的意思就是将第四个词改为“apple”,然后全部输出。注意:$4=”apple”这个地方必须是双引号,如果是单引号就只会输出“I have a”。
与sed一样,gawk也可以一行一行地输入脚本命令:
$ gawk '{
> $4="apple"
> print $0}'
apple
apple
因为我们之前啥也每输入,也就是说没有原始字符串,所以每次回车都替换第四个单词为apple,然后通过ctrl+D就可以结束运行了。
从文件中读取命令
新建脚本script
{print $1 "'s home direcotry is " $6}1
运行并得出结果:
$ gawk -F: -f script /etc/passwd
root's home direcotry is /root
daemon's home direcotry is /usr/sbin
bin's home direcotry is /bin
sys's home direcotry is /dev
sync's home direcotry is /bin
games's home direcotry is /usr/games
man's home direcotry is /var/cache/man
lp's home direcotry is /var/spool/lpd
mail's home direcotry is /var/mail
news's home direcotry is /var/spool/news
uucp's home direcotry is /var/spool/uucp
proxy's home direcotry is /bin
www-data's home direcotry is /var/www
backup's home direcotry is /var/backups
list's home direcotry is /var/list
irc's home direcotry is /var/run/ircd
gnats's home direcotry is /var/lib/gnats
nobody's home direcotry is /nonexistent
systemd-timesync's home direcotry is /run/systemd
systemd-network's home direcotry is /run/systemd/netif
systemd-resolve's home direcotry is /run/systemd/resolve
systemd-bus-proxy's home direcotry is /run/systemd
syslog's home direcotry is /home/syslog
_apt's home direcotry is /nonexistent
messagebus's home direcotry is /var/run/dbus
uuidd's home direcotry is /run/uuidd
lightdm's home direcotry is /var/lib/lightdm
whoopsie's home direcotry is /nonexistent
avahi-autoipd's home direcotry is /var/lib/avahi-autoipd
avahi's home direcotry is /var/run/avahi-daemon
dnsmasq's home direcotry is /var/lib/misc
colord's home direcotry is /var/lib/colord
speech-dispatcher's home direcotry is /var/run/speech-dispatcher
hplip's home direcotry is /var/run/hplip
kernoops's home direcotry is /
pulse's home direcotry is /var/run/pulse
rtkit's home direcotry is /proc
saned's home direcotry is /var/lib/saned
usbmux's home direcotry is /var/lib/usbmux
comac's home direcotry is /home/comac
sshd's home direcotry is /var/run/sshd
当然这个脚本也可以这么写:
{
text="'s home directory is "
print $1 text $6
}
注意:每个命令分别放到新的一行,有没有分号无所谓。
BEGIN,END关键字的使用
这两个关键字理解也简单,BEGIN就是在执行脚本前运行这个,END就是在执行脚本后运行这个。