Regex Cheatsheet
A complete regular expressions cheatsheet covering anchors, character classes, quantifiers, groups, lookarounds, flags, escape sequences, and common patterns like email, URL, and IPv4 validation.
Regex Cheatsheet
Regular expressions (regex) are a compact syntax for matching, validating, and extracting patterns in text. They're supported — with minor syntax differences — across nearly every programming language, from JavaScript and Python to PHP and Java. This cheatsheet covers the core building blocks you'll reach for most often: anchors, character classes, quantifiers, groups, lookarounds, flags, and escape sequences, plus a set of ready-to-use patterns for common validation tasks.
How to Read This Cheatsheet
Each entry lists the pattern itself, what it describes, an example showing it in context, and what it matches in a sample string. Click any pattern to copy it to your clipboard. Patterns are grouped by category — use the jump links above the tables to skip to the section you need.
Flavors and Compatibility
Most patterns here work across PCRE (PHP, Perl), JavaScript, Python's re module, and Java's Pattern class. A few features — like the x (extended) flag or \A/\Z anchors — are only available in some flavors. When in doubt, test your pattern directly in the Regex Tester before relying on it in production.
Greedy vs. Lazy Matching
By default, quantifiers like *, +, and {n,m} are greedy — they match as much as possible. Add ? after any quantifier to make it lazy, matching as little as possible.
For example, given the string <b>bold</b> and <i>italic</i>:
<.*>(greedy) matches the entire string from first<to last><.*?>(lazy) matches<b>only
This is one of the most common sources of unexpected regex behaviour.
Anchors
| Pattern | Description | Example | Matches |
|---|---|---|---|
Start of string (or line with |
^Hello | "Hello" at start | |
End of string (or line with |
world$ | "world" at end | |
Word boundary |
\bcat\b | "cat" but not "catch" | |
Not a word boundary |
\Bcat\B | "cat" inside "concatenate" | |
Start of string (PCRE/Python, no multiline) |
\AHello | "Hello" at very start | |
End of string (PCRE/Python, no multiline) |
world\Z | "world" at very end |
Character Classes
| Pattern | Description | Example | Matches |
|---|---|---|---|
Any character except newline |
c.t | "cat", "cut", "c4t" | |
Any digit (0–9) |
\d+ | "123" in "abc123" | |
Any non-digit |
\D+ | "abc" | |
Word character (a-z, A-Z, 0-9, _) |
\w+ | "hello_123" | |
Non-word character |
\W+ | " !@" | |
Whitespace (space, tab, newline, etc.) |
\s+ | " " | |
Non-whitespace |
\S+ | "hello" | |
Any one of a, b, or c |
[aeiou] | "a", "e", "i" | |
Any character NOT a, b, or c |
[^aeiou] | "b", "c", "d" | |
Any character in range a to z |
[a-z]+ | "hello" | |
Alphanumeric character |
[a-zA-Z0-9]+ | "Hello123" |
Quantifiers
| Pattern | Description | Example | Matches |
|---|---|---|---|
0 or more (greedy) |
ab* | "a", "ab", "abbb" | |
1 or more (greedy) |
ab+ | "ab", "abbb" | |
0 or 1 (optional) |
colou?r | "color", "colour" | |
Exactly n times |
\d{4} | "2024" | |
n or more times |
\d{2,} | "12", "1234" | |
Between n and m times |
\d{2,4} | "12", "123", "1234" | |
0 or more (lazy) |
<.*?> | First `<tag>` only | |
1 or more (lazy) |
".+?" | First quoted string | |
0 or 1 (lazy) |
colou??r | Prefers "color" |
Groups & References
| Pattern | Description | Example | Matches |
|---|---|---|---|
Capturing group |
(foo)+ | Captures "foo" | |
Non-capturing group |
(?:foo)+ | Groups without capture | |
Named capturing group |
(?P<year>\d{4}) | Captured as "year" | |
Named capturing group (JS/PCRE) |
(?<year>\d{4}) | Captured as "year" | |
Backreference to group 1 |
(\w+) \1 | "hello hello" | |
Named backreference |
\k<year> | Refers to named group | |
Alternation — match a or b |
cat|dog | "cat" or "dog" |
Lookaheads & Lookbehinds
| Pattern | Description | Example | Matches |
|---|---|---|---|
Positive lookahead — followed by abc |
\d+(?= dollars) | "100" in "100 dollars" | |
Negative lookahead — NOT followed by abc |
\d+(?! dollars) | "100" not before " dollars" | |
Positive lookbehind — preceded by abc |
(?<=\$)\d+ | "100" in "$100" | |
Negative lookbehind — NOT preceded by abc |
(?<!\$)\d+ | "100" not after "$" |
Flags / Modifiers
| Pattern | Description | Example |
|---|---|---|
Case-insensitive matching |
/hello/i matches "Hello" | |
Global — find all matches, not just first |
/\d+/g finds all numbers | |
Multiline — |
/^\w+/m | |
Dotall — |
/hello.world/s | |
Extended — allows whitespace and comments |
PCRE/Python only | |
Unicode mode |
/\p{L}+/u |
Escape Sequences
| Pattern | Description |
|---|---|
Newline (LF) |
|
Carriage return (CR) |
|
Horizontal tab |
|
Vertical tab |
|
Form feed |
|
Null character |
|
Unicode code point (JS/Java) |
|
Hex character code |
|
Literal backslash |
|
Literal dot (escape any metachar) |
Common Patterns
| Pattern | Description | Example | Matches |
|---|---|---|---|
Email address — basic email validation |
— | — | |
URL — HTTP/HTTPS URL |
— | — | |
IPv4 address — four octets 0–255 |
— | — | |
Date (YYYY-MM-DD) — ISO 8601 date |
— | — | |
Time (HH:MM) — 24-hour time |
— | — | |
Hex color — CSS hex color |
— | — | |
Phone (US) — US phone number |
— | — | |
Slug — URL-friendly string |
— | — | |
Username — alphanumeric + underscore |
— | — | |
Strong password — min 8 chars, upper, lower, digit, special |
— | — | |
HTML tag — opening HTML tag |
— | — | |
JWT token — JSON Web Token format |
— | — |
Frequently Asked Questions
What's the difference between (abc) and (?:abc)?
Both group a sequence of characters so quantifiers and alternation apply to the whole group, but (abc) is a capturing group — it stores the matched text and makes it available via backreferences (\1) or in your match results. (?:abc) is a non-capturing group — it groups without storing anything, which is slightly faster and keeps your capture group numbering clean when you don't need the matched text.
How do lookaheads and lookbehinds differ from regular matching?
Lookarounds let you assert that a pattern exists (or doesn't) immediately before or after your match, without including it in the match itself. \d+(?= dollars) matches the digits in "100 dollars" but the match result is just "100" — the lookahead checks for " dollars" without consuming it. This is useful for validating context (like currency formatting) without capturing the context as part of your result.
What does ^ mean inside vs. outside square brackets?
Outside brackets, ^ is an anchor meaning "start of string" — ^Hello matches strings that begin with "Hello". Inside a character class [^abc], it negates the class — matching any character that is NOT a, b, or c. Context is everything.
Why does my dot . not match newlines?
By default, . matches any character except a newline (\n). Enable dotall mode (the s flag in most engines, or re.DOTALL in Python) to make . match newlines too. Alternatively, use [\s\S] as a portable workaround that works in all engines.
Why does my regex match more or less than I expect?
The most common causes are: using a greedy quantifier where a lazy one was needed, forgetting that . doesn't match newlines by default (use the s/dotall flag if it should), or missing the g/global flag so only the first match is returned. Testing incrementally — building up your pattern piece by piece — is the fastest way to catch these issues.
Is regex the right tool for parsing HTML or JSON?
Generally, no. Regex works well for flat, predictable text patterns, but HTML and JSON are structured, nested formats that regex struggles to parse correctly — especially with deeply nested tags or escaped quotes. Use a proper parser (a DOM library for HTML, JSON.parse or similar for JSON) instead. Regex is fine for simple, targeted extraction from already-parsed structured data, but not for parsing the structure itself.
Are named capture groups supported everywhere?
Named groups ((?<name>...)) are supported in JavaScript (ES2018+), PCRE, Python (with the ?P<name> syntax), Java, and .NET. Older JavaScript engines and some legacy regex flavors don't support them — if you need broad compatibility, use numbered backreferences (\1, \2) instead and refer to groups by position.
What are common mistakes beginners make with regex?
The top three: (1) forgetting to escape metacharacters like . and + when you mean them literally; (2) writing greedy quantifiers when you need lazy ones, causing over-matching; (3) using regex for tasks better handled by a parser — parsing HTML or JSON with regex is notoriously fragile. Use a proper parser for structured formats.