REGULAR EXPRESSIONS

A regular  expression,  on  the  one  hand,  is a string like any
other;  a  sequence  of  characters.   On the other hand, special
characters within the string  have  certain  functions which make
regular expressions useful when trying to match portions of other
strings.   In the following discussion  and  examples,  a  string
containing a regular expression will be called  the  ``pattern'',
and the string against which it is to  be  matched  is called the
``reference string''.  For example, regular expressions allow one
to search for:

"all strings ending with the letters 'ize'"
           or:
"all strings beginning with a number between 1 and 3 and ending in a comma".

In order  to  accomplish this, regular expressions co-opt the use
of some characters  to  have  special meaning.  They also provide
for these characters to lose their special meaning if the user so
desires.  The rules for regular expresssion are:

			   RULES

 c    Any character c  matches  itself  unless  it  has  been
      assigned  other  special  meaning as listed below. Most
      special characters can be escaped  (made  to  lose  its
      special meaning), by placing the character '\' in front
      of it. This doesn't apply to '{' which  is  non-special
      until  it  is  escaped.  Thus although '*' normally has
      special meaning the string '\*' matches itself.

      Example:

      The pattern

           acdef

      matches

           s83acdeffff or acdefsecs or acdefsecs

      but not

           accdef or aacde1f

      That is it will any string that contains ``acdef'' any-
      where in the reference string.

      Example:

           Normally the characters '*'  and '$' are  special,
      but the pattern

           a\*bse\$

      acts as above. That is any reference string  containing
      ``*abse$'' as a substring will be flagged as a match.



 .     A period matches  any  character  except  the  newline
      character. This is known as the wildcard character.

      Example:

           The pattern

            ....

      will match any 4 characters in  the  reference  string,
      except a newline character.


 ^    If `^' appears at the begining of the pattern  then  it
      is said to ``anchor'' the match to the beginning of the
      line. That is, the reference string must start with the
      pattern  following  the  `^'. If this character appears
      anywhere else other than at the beginning of the  line,
      then  it  is  no longer considered special, and matches
      itself as any non-special character would. Similarly if
      it starts a string but is escaped, it matches itself.

      Example:

      The pattern

           ^efghi

      Will match

           efghi or efghijlk

      but not

           abcefghi

      That is the pattern will  match  only  those  reference
      strings  starting  with  ``efghi''. Just containing the
      substring is not sufficient.


 $     Occurring at the end of the  pattern,  this  character
      ``anchors''  the pattern to the end of the line (refer-
      ence string). A '$' occurring anywhere else in the pat-
      tern  is  regarded as a non-special. Similarly if it is
      at the end of the pattern but is escaped,  it  is  non-
      special.

      Example:

      The pattern

           efghi$

      Will match

           efghi or abcdefghi

      but not

           efghijkl

      That is the pattern will  match  only  those  reference
      strings ending with ``efghi''. Just containing the sub-
      string is not sufficient.


 \<    This sequence in the pattern causes the one  character
      regular expression following it only to match something
      at the beginning of a word: the beginning of a line  or
      just  before a letter, digit or underline character, or
      just after a charcter which is not one of these.

      Example:

           The pattern

           \    Constrains the one-character regular  expression  fol-
      lowing  it  to  be  at the end of a ``word'' as defined
      above.


 [string]

      One or more characters within  square  brackets.   This
      pattern  matches any single character within the brack-
      ets. The caret, '^', has a special meaning if it is the
      first  character  in the series: the pattern will match
      any character other than one in the list.

      Example:

           The pattern

           [^abc]

      Will match any character except 'a', 'b' or 'c'.

      To match a right bracket, ']', in the list it  must  be
      put first:

           []ab01]

      For a caret, '^', in the list it  can  appear  anywhere
      but first.

      In

           [ab^01]

      the caret loses its special meaning.


      The '-' character is special within square brackets. It
      is  interpreted  as a range of characters (in the ASCII
      character set) and  will  match  any  single  character
      within  that  range.   '[a-z]'  matches  any lower case
      letter. The '-' can be made non special by  placing  it
      first or last within the square brackets.


      The characters '$', '*' and '.' are not special  within
      square brackets.


      Example:

           The pattern

           [ab01]

      matches a single occurence of a character from the  set
      'a', 'b', '0', '1'.

      Example:

           The pattern

           [^ab01]

      will match any single character other  than  'a',  'b',
      '0', '1'.


      Example :

           The pattern

           [a0-9b]

      which matches one of 'a', 'b' or a digit between 0  and
      9 inclusive.

      Example :

           The pattern

           [^a0-9b.$]


      means any single character not 'a', 'b' '.' , '$' or  a
      digit between 0 and 9 inclusive.

 *     An asterisk following a regular expression in the pat-
      tern   has   the   effect  of  matching  zero  or  more
      occurrences of that expression.

      Example:

           The pattern

           a*

      means zero or more occurrences of the character 'a'.


      Example:

           The pattern

           [A-Z]*

      means zero or more occurrences of the upper case alpha-
      bet.




 \{m\}

 \{m,\}

 \{m,n\}

      A one-character regular expression followed by  one  of
      the  three  of  these  constructions  causes a range of
      occurrences of that regular expression to  be  matched.
      If  it  is  followed by \{m\} where m is a non-negative
      integer between 0 and 255 (inclusive), then  exactly  m
      occurrences  of that regular expression are matched. If
      followed by \{m,\}, then at  least  m  occurrences  are
      matched.   Finally, if it is followed by \{m,n\} (where
      n is a non-negative integer between 0 and 255 and where
      n > m), then between m and n occurrences of the expres-
      sion are matched.

      Example:

           The pattern

           ab\{3\}

      would match any substring in the reference string of an
      'a' followed by exactly 3 'b's.

      Example:

           The pattern

           ab\{3,\}

      would match any substring in the reference string of an
      'a' followed by at least 3 'b's.


      Example:

           The pattern

           ab\{3,5\}

      would match any substring in the reference string of an
      'a' followed by at least 3 but at most 5 'b's.


      Common Problems with Regular Expression


 (1)  When matching a substring it is not  necessary  to  use
      the  wildcard character to match the part of the refer-
      ence string preceeding and following the substring.

      Example:

           The pattern

           abcd

      will match any reference string  containing  this  pat-
      tern. It is not necessary to use

            .*abcd.*

      as the pattern.


 (2)  In order to constrain a pattern to the entire reference
      pattern, use the the construction:

           ^pattern$


 (3)  The easiest way to obtain case insensitivity in a regu-
      lar  expression  is to use the '[]' operator. For exam-
      ple, a pattern to match the word ``hello'' regarless of
      the case of the letters would be:

            [Hh][Ee][Ll][Ll][Oo]