ed: Regular expressions
5 Regular expressions
*********************
Regular expressions are patterns used in selecting text. For example, the
'ed' command
g/STRING/
prints all lines containing STRING. Regular expressions are also used by
the 's' command for selecting old text to be replaced with new text.
In addition to specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for a
regular expression to match several strings in a line, then the left-most
match is the one selected. If the regular expression permits a variable
number of matching characters, the longest sequence starting at that point
is matched.
An empty regular expression is equivalent to the last regular expression
processed. Therefore '/RE/s//REPLACEMENT/' replaces RE with REPLACEMENT.
As a GNU extension, a regular expression /RE/ may be followed by the
suffix 'I' which makes 'ed' match RE in a case-insensitive manner. Note
that the suffix is evaluated when the regular expression is compiled, thus
it is invalid to specify it together with the empty regular expression.
The following symbols are used in constructing regular expressions using
POSIX basic regular expression syntax:
'C'
Any character C not listed below, including '{', '}', '(', ')', '<',
and '>', matches itself.
'\C'
Any backslash-escaped character C, other than '{', '}', '(', ')', '<',
'>', 'b', 'B', 'w', 'W', '+', and '?', matches itself.
'.'
Matches any single character.
'[CHAR-CLASS]'
Matches any single character in CHAR-CLASS. To include a ']' in
CHAR-CLASS, it must be the first character. A range of characters may
be specified by separating the end characters of the range with a '-',
e.g., 'a-z' specifies the lower case characters. The following literal
expressions can also be used in CHAR-CLASS to specify sets of
characters:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
If '-' appears as the first or last character of CHAR-CLASS, then it
matches itself. All other characters in CHAR-CLASS match themselves.
Patterns in CHAR-CLASS of the form:
[.COL-ELM.]
[=COL-ELM=]
where COL-ELM is a "collating element" are interpreted according to
'locale'(5). See 'regex'(7) for an explanation of these constructs.
'[^CHAR-CLASS]'
Matches any single character, other than newline, not in CHAR-CLASS.
CHAR-CLASS is defined as above.
'^'
If '^' is the first character of a regular expression, then it anchors
the regular expression to the beginning of a line. Otherwise, it
matches itself.
'$'
If '$' is the last character of a regular expression, it anchors the
regular expression to the end of a line. Otherwise, it matches itself.
'\(RE\)'
Defines a (possibly empty) subexpression RE. Subexpressions may be
nested. A subsequent backreference of the form '\N', where N is a
number in the range [1,9], expands to the text matched by the Nth
subexpression. For example, the regular expression '\(a.c\)\1' matches
the string 'abcabc', but not 'abcadc'. Subexpressions are ordered
relative to their left delimiter.
'*'
Matches zero or more repetitions of the regular expression immediately
preceding it. The regular expression can be either a single character
regular expression or a subexpression. If '*' is the first character
of a regular expression or subexpression, then it matches itself. The
'*' operator sometimes yields unexpected results. For example, the
regular expression 'b*' matches the beginning of the string 'abbb', as
opposed to the substring 'bbb', since an empty string is the only
left-most match.
'\{N,M\}'
'\{N,\}'
'\{N\}'
Matches the single character regular expression or subexpression
immediately preceding it at least N and at most M times. If M is
omitted, then it matches at least N times. If the comma is also
omitted, then it matches exactly N times. If any of these forms occurs
first in a regular expression or subexpression, then it is interpreted
literally (i.e., the regular expression '\{2\}' matches the string
'{2}', and so on).
The following extensions to basic regular expression operators are
preceded by a backslash '\' to distinguish them from traditional 'ed'
syntax. They may be unavailable depending on the particular regex
implementation in your system.
'\<'
'\>'
Anchors the single character regular expression or subexpression
immediately following it to the beginning (in the case of '\<') or
ending (in the case of '\>') of a "word", i.e., in ASCII, a maximal
string of alphanumeric characters, including the underscore (_).
'\`'
'\''
Unconditionally matches the beginning '\`' or ending '\'' of a line.
'\?'
Optionally matches the single character regular expression or
subexpression immediately preceding it. For example, the regular
expression 'a[bd]\?c' matches the strings 'abc', 'adc' and 'ac'. If
'\?' occurs at the beginning of a regular expressions or
subexpression, then it matches a literal '?'.
'\+'
Matches the single character regular expression or subexpression
immediately preceding it one or more times. So the regular expression
'a\+' is shorthand for 'aa*'. If '\+' occurs at the beginning of a
regular expression or subexpression, then it matches a literal '+'.
'\b'
Matches the beginning or ending (empty string) of a word. Thus the
regular expression '\bhello\b' is equivalent to '\<hello\>'. However,
'\b\b' is a valid regular expression whereas '\<\>' is not.
'\B'
Matches (an empty string) inside a word.
'\w'
Matches any word-constituent character (letters, digits, and the
underscore).
'\W'
Matches any character that is not a word-constituent.