Regular expressions quick reference
Regular expressions are a powerful tool for finding and replacing text in a program, or at the command line. This page describes the most common regular expression symbols, and how to use them.
Description
Regular expressions (shortened as "regex") are special strings representing a pattern to be matched in a search operation. They are an important tool in a wide variety of computing applications, from programming languages like Java and Perl, to text processing tools like grep, sed, and the text editor vim. Below is an example of a regular expression from our regular expression term page.
The power of regular expressions comes from its use of metacharacters, which are special characters (or sequences of characters) used to represent something else. For instance, in a regular expression the metacharacter ^ means "not." So, while "a" means "match lowercase a", "^a" means "do not match lowercase a."
The tables below describe many standard components of regular expressions.
There are different so-called "flavors" of regex — Java, Perl, and Python have slightly different rules for regular expressions, for example. On this page, we stick to standard regex, and you should be able to use this reference for any implementation.
Anchors and Boundaries
Anchors and boundaries allow you to describe text in terms of where it's located. For instance, you might want to search for a certain word, but only if it's the first word on a line. Or you might want to look for a certain series of letters, but only if they appear at the very end of a word.
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
^ | Start of string or line | ^abc | abc (appearing at start of string or line) |
$ | End of string, or end of line | xyz$ | xyz (appearing at end of string or line) |
\b | Word boundary | ing\b | matching (matches ing if it is at the end of a word) |
\B | NOT word-boundary | \Bing\B | stinger (matches ing if it is not at the beginning or end of the word) |
\< | Start of word | \<when | whenever (matches when only if it is at the beginning of a word) |
\> | End of word | ever\> | whenever (matches ever only if it is at the end of a word) |
Character Classes
When searching for text, it's useful to be able to choose characters based solely upon their classification. The fundamental classes of character are "word" characters (such as numbers and letters) and "non-word" characters (such as spaces and punctuation marks).
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
. | Matches any single character except the newline character. | ab.def | abcdef, ab9def, ab=def |
\s | Matches a whitespace character (such as a space, a tab, a form feed, etc.) | abcd\se | abcd e, abcd(tab)e |
\S | NOT whitespace | \S\S\s\S | AB D, 99(tab)9 |
\w | A word character. A word character is a letter, a number, or an underscore. This set of characters may also be represented by the regex character set [a-zA-Z0-9_] | \w\{1,\}-\w\{1,} (see quantifiers, below) |
well-wishes, far-fetched |
\W | NOT a word character | \w\W\{1,\}\w | a,!-(?&;b, 9-5 |
Special Whitespace Characters
Metacharacter Sequence | Matches |
---|---|
\n | a newline |
\t | a tab |
\r | a carriage return |
\v | a vertical tab |
\f | a form feed |
Quantifiers
Quantifiers allow you to declare quantities of data as part of your pattern. For instance, you might need to match exactly six spaces, or locate every numeric string that is between four and eight digits in length.
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
* | Zero or more of the preceding character | ca*t | cat, ct, caaaaaaaaaaaat |
character\{m\} | Exactly m occurrences of character | f\{3\} | fff |
character\{m,n\} | No fewer than m, but no more than n occurrences of character | g\{4,6\} | gggg, ggggg, gggggg |
character\{m,\} | At least m occurrences of character | h\{2,\} | hh, hhhhhhhh, and hhhhhhhhhhhhhh would match, but h would not |
Literal Characters and Sequences
Metacharacters are a powerful tool because they have special meaning, but sometimes they need to be matched literally. For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
\\ | Literal backslash | \\ | \ |
\^ | Literal caret | \^\{5\} | ^^^^^ |
\$ | Literal dollar sign | \$5 | $5 |
\. | Literal period | Yes\. | Yes. |
\* | Literal asterisk | typo\* | typo* |
\[ | Literal open bracket | [3\[] | 3, [ |
\] | Literal close bracket | \] | ] |
Character Sets and Ranges
A character set is an explicit list of the characters that may qualify for a match in a search. A character set is indicated by enclosing a set of characters in brackets ([ and ]). For instance, the character set [abz] will match any of the characters a, b, or z, or a combination of these such as ab, za, or baz.
Ranges are a type of character set which uses a dash between characters to imply the entire range of characters between them, as well as the beginning and end characters themselves. For instance, the range [e-h] would match any of the characters e, f, g, or h, or any combination of these, such as hef. The range [3-5] would match any of the digits 3, 4, or 5, or a combination of these such as 45.
When defining a range of characters, you can figure out the exact order in which they appear by looking at an ASCII (American Standard Code for Information Interchange) character table.
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
[characters] | The characters listed inside the brackets are part of a matching-character set | [abcd] | a, b, c, d, abcd |
[^...] | Characters inside the brackets are a NON-matching set. Any character not inside the brackets is a matching character. | [^abcd] | Any occurrence of any character except a, b, c, d. For instance, when, zephyr, e, xyz |
[character-character] | Any character in the range between two characters, including the characters, is part of the set | [a-z] | Any lowercase letter |
[^character] | Any character that is NOT the listed character | [^A] | Any character except capital A |
Ranges can also be combined by concatenating. For instance: | [f-hAC-E3-5] | Matches any occurrence of an f, g, h, A, C, D, E, 3, 4, 5 | |
Ranges can also be modified with a quantifier. For instance: | [a-c0-2]* | Matches zero or more consecutive occurrences of a, b, c, 0, 1, 2. For example, ac1cb would match |
Grouping
Grouping allows you to treat another expression as a single unit.
Metacharacter Sequence | Meaning | Example Expression | Example Match |
---|---|---|---|
\(expression\) | expression will match as a group | \(ab\) | ab, abracadabra |
Grouped expressions can be treated as a unit just like a character. For example: | \(ab\)\{3\} | abababcdefg |
Extended regular expressions
ERE (extended regular expressions) are an extension of basic regular expressions.
EREs support additional quantifiers, do not require certain metacharacters to be escaped, and obey other special rules. If your application supports extended regex, consult your manual for their proper syntax.
Related commands
grep — Command-line tool for finding text which matches a regular expression.
sed — Command-line tool for filtering and transforming text using regular expressions.
vim — A powerful text editor.