4.18 在文件中移除包含某个单词的句子

只要能写出正确的正则表达式,移除包含某个单词的句子简直就是手到擒来。这里给出了一个解决类似问题的练习。

4.18.1 预备知识

sed是进行文本替换的不二之选。这样,我们就可以通过sed用空白替代匹配的句子。

4.18.2 实战演练

先创建一个包含替换文本的文件。例如:

  1. $ cat sentence.txt
  2.  
  3. Linux refers to the family of Unix-like computer operating systems that use the Linux kernel. Linux can be installed on a wide variety of computer hardware, ranging from mobile phones, tablet computers and video game consoles, to mainframes and supercomputers. Linux is predominantly known for its use in servers. It has a server market share ranging between 20-40%. Most desktop computers run either Microsoft Windows or Mac OS X, with Linux having anywhere from a low of an estimated 1-2% of the desktop market to a high of an estimated 4.8%. However, desktop use of Linux has become increasingly popular in recent years, partly owing to the popular Ubuntu, Fedora, Mint, and openSUSE distributions and the emergence of netbooks and smart phones running an embedded Linux.

我们的目标是移除包含单词“mobile phones”的句子。用下面的sed语句来完成这项任务:

  1. $ sed 's/ [^.]*mobile phones[^.]*\.//g' sentence.txt
  2.  
  3. Linux refers to the family of Unix-like computer operating systems that use the Linux kernel. Linux is predominantly known for its use in servers. It has a server market share ranging between 20-40%. Most desktop computers run either Microsoft Windows or Mac OS X, with Linux having anywhere from a low of an estimated 1-2% of the desktop market to a high of an estimated 4.8%. However, desktop use of Linux has become increasingly popular in recent years, partly owing to the popular Ubuntu,Fedora, Mint, and openSUSE distributions and the emergence of netbooks and smart phones running an embedded Linux.

4.18.3 工作原理

让我们分析一下sed的正则表达式's/ [^.]*mobile phones[^.]*.//g '

该正则表达式的格式为:'s/匹配样本/替代字符串/g'

它将与匹配样本相匹配的每一处内容都用替代字符串进行替换。

这里的匹配样本是用来匹配一句文本的正则表达式。文件中的每一句话第一个字符都是空格,句与句之间都以"."来分隔。因此我们需要匹配内容的格式就是:空格+若干文本+需要匹配的字符串+若干文本+句点。一个句子除了作为定界符的句点之外,可以包含任意字符。因此我们要使用[^.][^.]* 可以匹配除句点之外的任何字符的组合。用来匹配文本的“mobile”被放置在两个 [^.]* 之间。每一个匹配的句子均被 //替换(注意,//之间没有任何内容)。

4.18.4 参考

  • 4.6节讲解了sed命令。

  • 4.2节讲解了如何使用正则表达式。