Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

Regular expressions are still quite useful in many situations. However, there are some differences in how they are written across different programs, so it’s a good idea to make a cheat sheet for reference!

Basic General Part

Table source: yirlin’s blog

  • \: Escapes the next character, which can be a special character, an ordinary character, a backreference, or an octal escape. For example, n matches the character n. \n matches a newline character. The sequence \\ matches \, and \( matches (.
  • ^: Matches the start of the input string. If the Multiline property is set on the RegExp object, ^ also matches \n or \r after positions.
  • $: Matches the end of the input string. If the Multiline property is set on the RegExp object, $ also matches \n or \r before positions.
  • *: Matches zero or more occurrences of the preceding subexpression. For example, zo* can match z and zoo. * is equivalent to {0,}.
  • +: Matches one or more occurrences of the preceding subexpression. For example, zo+ can match zo and zoo, but not z. + is equivalent to {1,}.
  • ?: Matches zero or one occurrence of the preceding subexpression. For example, do(es)? can match do or does in do. ? is equivalent to {0,1}.
  • {n}: n is a non-negative integer. Matches exactly n occurrences. For example, o{2} cannot match the o in Bob, but it can match all os in food.
  • {n,}: n is a non-negative integer. Matches at least n occurrences. For example, o{2,} cannot match the o in Bob, but it can match all os in foooood. {1,} is equivalent to +. {0,} is equivalent to *.
  • {n,m}: m and n are non-negative integers, where n <= m. Matches at least n occurrences and no more than m occurrences. For example, o{1,3} will match the first three os in fooooood. {0,1} is equivalent to ?. Note that there should be no space between the comma and the two numbers.
  • ?: When this character follows any other quantifier (*, +, ?, {n}, {n,}, {n,m}), it makes the matching pattern non-greedy. Non-greedy mode matches as few characters as possible in the searched string, while the default greedy mode matches as many characters as possible. For example, for the string oooo, o+? will match a single o, and o+ will match all os.
  • .: Matches any single character except \n. To match any character including \n, use the pattern [.\n].
  • (pattern): Matches pattern and captures this match. Captured matches can be obtained from the generated Matches collection, in VBScript using SubMatches collection, in JScript using $0…$9 properties. To match round bracket characters, use \( or \).
  • (?:pattern): Matches pattern but does not capture the match result, i.e., a non-capturing group, which is useful when combining parts of a pattern with “or” characters. For example, industr(?:y|ies) is a more concise expression than industry|industries.
  • (?=pattern): Positive lookahead, matches at the start position of any string that matches pattern. This is a non-capturing group, which means this match does not need to be captured for later use. For example, Windows (?=95|98|NT|2000) can match Windows 2000 in Windows 2000, but it cannot match Windows 3.1 in Windows 3.1. Lookahead does not consume characters, so after a match occurs, the search for the next match starts immediately after the last match without starting from the character following the lookahead.
  • (?!pattern): Negative lookahead, matches at the start position of any string that does not match pattern. This is a non-capturing group, which means this match does not need to be captured for later use. For example, Windows (?!95|98|NT|2000) can match Windows 3.1 in Windows 3.1, but it cannot match Windows 2000 in Windows 2000. Lookahead does not consume characters, so after a match occurs, the search for the next match starts immediately after the last match without starting from the character following the lookahead.
  • x|y: Matches x or y. For example, z|food can match z or food. (z|f)ood then matches zood or food.
  • [xyz]: Character set. Matches any one of the characters contained within it. For example, [abc] can match a in plain.
  • [^xyz]: Negative character set. Matches any character not contained within it. For example, [^abc] can match p in plain.
  • [a-z]: Character range. Matches any character within the specified range. For example, [a-z] can match any lowercase letter from a to z.
  • [^a-z]: Negative character range. Matches any character not within the specified range. For example, [^a-z] can match any character that is not a lowercase letter from a to z.
  • \b: Matches a word boundary, i.e., the position between a word and a space. For example, er\b can match er in never, but it cannot match er in verb.
  • \B: Matches a non-word boundary. er\B can match er in verb, but it cannot match er in never.
  • \cx: Matches the control character specified by x. For example, \cM matches a Control-M or carriage return. The value of x must be A-Z or a-z; otherwise, c is treated as an ordinary c character.
  • \d: Matches any digit character. Equivalent to [0-9].
  • \D: Matches any non-digit character. Equivalent to [^0-9].
  • \f: Matches a form feed. Equivalent to \x0c and \cL.
  • \n: Matches a newline character. Equivalent to \x0a and \cJ.
  • \r: Matches a carriage return character. Equivalent to \x0d and \cM.
  • \s: Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [\f\n\r\t\v].
  • \S: Matches any non-whitespace character. Equivalent to [^\\f\\n\\r\\t\\v].
  • \t: Matches a tab character. Equivalent to \x09 and \cI.
  • \v: Matches a vertical tab character. Equivalent to \x0b and \cK.
  • \w: Matches any word character, including underscores. Equivalent to [A-Za-z0-9_].
  • \W: Matches any non-word character. Equivalent to [^A-Za-z0-9_].
  • \xn: Matches n, where n is a two-digit hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, \x41 matches A. \x041 is equivalent to \x04 &1. Regular expressions can use ASCII encoding.
  • \num: Matches num, where num is a positive integer. A reference to the captured match. For example, (.)'\1 matches two consecutive identical characters.
  • \n: Identifies an octal escape value or a backreference. If \n has at least n captured subexpressions before it, then n is a backreference. Otherwise, if n is an octal digit (0-7), then n is an octal escape value.
  • \nm: Identifies an octal escape value or a backreference. If \nm has at least nm captured subexpressions before it, then nm is a backreference. If \nm has at least n captured subexpressions, then n is a backreference followed by the letter m. If none of the above conditions are met, if both n and m are octal digits (0-7), then \nm matches the octal escape value nm.
  • \nml: If n is an octal digit (0-3) and both m and l are octal digits (0-7), then it matches the octal escape value nml.
  • \un: Matches n, where n is a four-digit hexadecimal representation of a Unicode character. For example, \u00A9 matches the copyright symbol (?)`.

Comments

Please leave your comments here