Regular expressions are still quite useful in many situations. However, there are some differences in how they are written across different programs, so it’s a good idea to make a cheat sheet for reference!
Basic General Part
Table source: yirlin’s blog
\
: Escapes the next character, which can be a special character, an ordinary character, a backreference, or an octal escape. For example,n
matches the charactern
.\n
matches a newline character. The sequence\\
matches\
, and\(
matches(
.^
: Matches the start of the input string. If the Multiline property is set on the RegExp object,^
also matches\n
or\r
after positions.$
: Matches the end of the input string. If the Multiline property is set on the RegExp object,$
also matches\n
or\r
before positions.*
: Matches zero or more occurrences of the preceding subexpression. For example,zo*
can matchz
andzoo
.*
is equivalent to{0,}
.+
: Matches one or more occurrences of the preceding subexpression. For example,zo+
can matchzo
andzoo
, but notz
.+
is equivalent to{1,}
.?
: Matches zero or one occurrence of the preceding subexpression. For example,do(es)?
can matchdo
ordoes
indo
.?
is equivalent to{0,1}
.{n}
: n is a non-negative integer. Matches exactly n occurrences. For example,o{2}
cannot match theo
inBob
, but it can match allo
s infood
.{n,}
: n is a non-negative integer. Matches at least n occurrences. For example,o{2,}
cannot match theo
inBob
, but it can match allo
s infoooood
.{1,}
is equivalent to+
.{0,}
is equivalent to*
.{n,m}
: m and n are non-negative integers, where n <= m. Matches at least n occurrences and no more than m occurrences. For example,o{1,3}
will match the first threeo
s infooooood
.{0,1}
is equivalent to?
. Note that there should be no space between the comma and the two numbers.?
: When this character follows any other quantifier (*, +, ?, {n}, {n,}, {n,m}
), it makes the matching pattern non-greedy. Non-greedy mode matches as few characters as possible in the searched string, while the default greedy mode matches as many characters as possible. For example, for the stringoooo
,o+?
will match a singleo
, ando+
will match allo
s..
: Matches any single character except\n
. To match any character including\n
, use the pattern[.\n]
.(pattern)
: Matches pattern and captures this match. Captured matches can be obtained from the generated Matches collection, in VBScript using SubMatches collection, in JScript using $0…$9 properties. To match round bracket characters, use\(
or\)
.(?:pattern)
: Matches pattern but does not capture the match result, i.e., a non-capturing group, which is useful when combining parts of a pattern with “or” characters. For example,industr(?:y|ies)
is a more concise expression thanindustry|industries
.(?=pattern)
: Positive lookahead, matches at the start position of any string that matches pattern. This is a non-capturing group, which means this match does not need to be captured for later use. For example,Windows (?=95|98|NT|2000)
can matchWindows 2000
inWindows 2000
, but it cannot matchWindows 3.1
inWindows 3.1
. Lookahead does not consume characters, so after a match occurs, the search for the next match starts immediately after the last match without starting from the character following the lookahead.(?!pattern)
: Negative lookahead, matches at the start position of any string that does not match pattern. This is a non-capturing group, which means this match does not need to be captured for later use. For example,Windows (?!95|98|NT|2000)
can matchWindows 3.1
inWindows 3.1
, but it cannot matchWindows 2000
inWindows 2000
. Lookahead does not consume characters, so after a match occurs, the search for the next match starts immediately after the last match without starting from the character following the lookahead.x|y
: Matches x or y. For example,z|food
can matchz
orfood
.(z|f)ood
then matcheszood
orfood
.[xyz]
: Character set. Matches any one of the characters contained within it. For example,[abc]
can matcha
inplain
.[^xyz]
: Negative character set. Matches any character not contained within it. For example,[^abc]
can matchp
inplain
.[a-z]
: Character range. Matches any character within the specified range. For example,[a-z]
can match any lowercase letter froma
toz
.[^a-z]
: Negative character range. Matches any character not within the specified range. For example,[^a-z]
can match any character that is not a lowercase letter froma
toz
.\b
: Matches a word boundary, i.e., the position between a word and a space. For example,er\b
can matcher
innever
, but it cannot matcher
inverb
.\B
: Matches a non-word boundary.er\B
can matcher
inverb
, but it cannot matcher
innever
.\cx
: Matches the control character specified by x. For example,\cM
matches a Control-M or carriage return. The value of x must be A-Z or a-z; otherwise, c is treated as an ordinaryc
character.\d
: Matches any digit character. Equivalent to[0-9]
.\D
: Matches any non-digit character. Equivalent to[^0-9]
.\f
: Matches a form feed. Equivalent to\x0c
and\cL
.\n
: Matches a newline character. Equivalent to\x0a
and\cJ
.\r
: Matches a carriage return character. Equivalent to\x0d
and\cM
.\s
: Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to[\f\n\r\t\v]
.\S
: Matches any non-whitespace character. Equivalent to[^\\f\\n\\r\\t\\v]
.\t
: Matches a tab character. Equivalent to\x09
and\cI
.\v
: Matches a vertical tab character. Equivalent to\x0b
and\cK
.\w
: Matches any word character, including underscores. Equivalent to[A-Za-z0-9_]
.\W
: Matches any non-word character. Equivalent to[^A-Za-z0-9_]
.\xn
: Matches n, where n is a two-digit hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example,\x41
matchesA
.\x041
is equivalent to\x04 &1
. Regular expressions can use ASCII encoding.\num
: Matches num, where num is a positive integer. A reference to the captured match. For example,(.)'\1
matches two consecutive identical characters.\n
: Identifies an octal escape value or a backreference. If\n
has at least n captured subexpressions before it, then n is a backreference. Otherwise, if n is an octal digit (0-7), then n is an octal escape value.\nm
: Identifies an octal escape value or a backreference. If\nm
has at least nm captured subexpressions before it, then nm is a backreference. If\nm
has at least n captured subexpressions, then n is a backreference followed by the letter m. If none of the above conditions are met, if both n and m are octal digits (0-7), then\nm
matches the octal escape value nm.\nml
: If n is an octal digit (0-3) and both m and l are octal digits (0-7), then it matches the octal escape value nml.\un
: Matches n, where n is a four-digit hexadecimal representation of a Unicode character. For example,\u00A9
matches the copyright symbol (?)`.