regular_expressions
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | regular_expressions [2016/04/20 13:59] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Syntax of Regular Expressions ====== | ||
+ | ===== Simple matches ===== | ||
+ | |||
+ | Any single character matches itself, unless it is a meta-character with a special meaning described below. | ||
+ | |||
+ | A series of characters matches that series of characters in the target string, so the pattern " | ||
+ | |||
+ | You can cause characters that normally function as meta-characters or escape sequences to be interpreted literally by ' | ||
+ | |||
+ | Examples: | ||
+ | foobar | ||
+ | \^FooBarPtr | ||
+ | | ||
+ | ===== Escape sequences ===== | ||
+ | |||
+ | Characters may be specified using a escape sequences syntax much like that used in C and Perl: ' | ||
+ | |||
+ | \xnn char with hex code nn | ||
+ | \x{nnnn} char with hex code nnnn (one byte for plain text and two bytes for Unicode) | ||
+ | \t tab (HT/TAB), same as \x09 | ||
+ | \n | ||
+ | \r | ||
+ | \f form feed (FF), same as \x0c | ||
+ | \a alarm (bell) (BEL), same as \x07 | ||
+ | \e | ||
+ | |||
+ | Examples: | ||
+ | foo\x20bar | ||
+ | \tfoobar | ||
+ | |||
+ | ===== Character classes ===== | ||
+ | |||
+ | You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list. | ||
+ | |||
+ | If the first character after the ' | ||
+ | |||
+ | Examples: | ||
+ | foob[aeiou]r | ||
+ | foob[^aeiou]r | ||
+ | |||
+ | Within a list, the ' | ||
+ | |||
+ | If You want ' | ||
+ | |||
+ | Examples: | ||
+ | [-az] matches ' | ||
+ | [az-] matches ' | ||
+ | [a\-z] | ||
+ | [a-z] matches all twenty six small characters from ' | ||
+ | [\n-\x0D] | ||
+ | [\d-t] | ||
+ | []-a] matches any char from ' | ||
+ | |||
+ | ===== Meta-characters ===== | ||
+ | |||
+ | Meta-characters are special characters which are the essence of Regular Expressions. There are different types of meta-characters, | ||
+ | |||
+ | |||
+ | ==== Line separators ==== | ||
+ | |||
+ | ^ start of line | ||
+ | $ end of line | ||
+ | \A start of text | ||
+ | \Z end of text | ||
+ | . any character in line | ||
+ | |||
+ | Examples: | ||
+ | ^foobar | ||
+ | foobar$ | ||
+ | ^foobar$ | ||
+ | foob.r | ||
+ | |||
+ | The ' | ||
+ | You may, however, wish to treat a string as a multi-line buffer, such that the ' | ||
+ | The \A and \Z are just like ' | ||
+ | |||
+ | ==== Predefined classes ==== | ||
+ | |||
+ | \w an alphanumeric character (including " | ||
+ | \W a nonalphanumeric | ||
+ | \d a numeric character | ||
+ | \D a non-numeric | ||
+ | \s any space (same as [ \t\n\r\f]) | ||
+ | \S a non space | ||
+ | |||
+ | You may use \w, \d and \s within custom character classes. | ||
+ | |||
+ | Examples: | ||
+ | foob\dr | ||
+ | foob[\w\s]r matchs strings like ' | ||
+ | |||
+ | ==== Word boundaries ==== | ||
+ | |||
+ | \b Match a word boundary | ||
+ | \B Match a non-(word boundary) | ||
+ | |||
+ | A word boundary (\b) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W. | ||
+ | |||
+ | |||
+ | ==== Iterators ==== | ||
+ | |||
+ | Any item of a regular expression may be followed by another type of meta-characters - iterators. Using this meta-characters You can specify number of occurrences of previous character, meta-character or subexpression. | ||
+ | |||
+ | < | ||
+ | * zero or more (" | ||
+ | + one or more (" | ||
+ | ? zero or one (" | ||
+ | {n} exactly n times (" | ||
+ | {n,} at least n times (" | ||
+ | {n,m} at least n but not more than m times (" | ||
+ | *? zero or more (" | ||
+ | +? one or more (" | ||
+ | ?? zero or one (" | ||
+ | {n}? | ||
+ | {n,}? at least n times (" | ||
+ | {n,m}? at least n but not more than m times (" | ||
+ | </ | ||
+ | |||
+ | So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution. | ||
+ | |||
+ | If a curly bracket occurs in any other context, it is treated as a regular character. | ||
+ | |||
+ | Examples: | ||
+ | foob.*r | ||
+ | foob.+r | ||
+ | foob.? | ||
+ | fooba{2}r | ||
+ | fooba{2, | ||
+ | fooba{2,3}r matches strings like ' | ||
+ | |||
+ | A little explanation about " | ||
+ | |||
+ | ==== Alternatives ==== | ||
+ | |||
+ | You can specify a series of alternatives for a pattern using ' | ||
+ | Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against ' | ||
+ | Also remember that ' | ||
+ | |||
+ | Examples: | ||
+ | foo(bar|foo) | ||
+ | |||
+ | |||
+ | ==== Subexpressions ==== | ||
+ | |||
+ | The bracketing construct ( ... ) may also be used for define r.e. subexpressions. | ||
+ | |||
+ | Subexpressions are numbered based on the left to right order of their opening parenthesis. | ||
+ | First subexpression has number ' | ||
+ | |||
+ | Examples: | ||
+ | (foobar){8, | ||
+ | foob([0-9]|a+)r matches ' | ||
+ | |||
+ | |||
+ | ==== Backreferences ==== | ||
+ | |||
+ | Meta-characters \1 through \9 are interpreted as backreferences. \<n> matches previously matched subexpression #<n>. | ||
+ | |||
+ | Examples: | ||
+ | (.)\1+ | ||
+ | (.+)\1+ | ||
+ | (['" | ||
+ | |
regular_expressions.txt · Last modified: 2016/04/20 13:59 by 127.0.0.1