Back to Reference

Regex Cheatsheet

A complete regular expressions cheatsheet covering anchors, character classes, quantifiers, groups, lookarounds, flags, escape sequences, and common patterns like email, URL, and IPv4 validation.

Try it

Regex Cheatsheet

Regular expressions (regex) are a compact syntax for matching, validating, and extracting patterns in text. They're supported — with minor syntax differences — across nearly every programming language, from JavaScript and Python to PHP and Java. This cheatsheet covers the core building blocks you'll reach for most often: anchors, character classes, quantifiers, groups, lookarounds, flags, and escape sequences, plus a set of ready-to-use patterns for common validation tasks.

How to Read This Cheatsheet

Each entry lists the pattern itself, what it describes, an example showing it in context, and what it matches in a sample string. Click any pattern to copy it to your clipboard. Patterns are grouped by category — use the jump links above the tables to skip to the section you need.

Flavors and Compatibility

Most patterns here work across PCRE (PHP, Perl), JavaScript, Python's re module, and Java's Pattern class. A few features — like the x (extended) flag or \A/\Z anchors — are only available in some flavors. When in doubt, test your pattern directly in the Regex Tester before relying on it in production.

Greedy vs. Lazy Matching

By default, quantifiers like *, +, and {n,m} are greedy — they match as much as possible. Add ? after any quantifier to make it lazy, matching as little as possible.

For example, given the string bold and italic:

<.*> (greedy) matches the entire string from first < to last >
<.*?> (lazy) matches  only

This is one of the most common sources of unexpected regex behaviour.

Anchors

Description	Example	Matches
Start of string (or line with `m` flag)	^Hello	"Hello" at start
End of string (or line with `m` flag)	world$	"world" at end
Word boundary	\bcat\b	"cat" but not "catch"
Not a word boundary	\Bcat\B	"cat" inside "concatenate"
Start of string (PCRE/Python, no multiline)	\AHello	"Hello" at very start
End of string (PCRE/Python, no multiline)	world\Z	"world" at very end

Character Classes

Description	Example	Matches
Any character except newline	c.t	"cat", "cut", "c4t"
Any digit (0–9)	\d+	"123" in "abc123"
Any non-digit	\D+	"abc"
Word character (a-z, A-Z, 0-9, _)	\w+	"hello_123"
Non-word character	\W+	" !@"
Whitespace (space, tab, newline, etc.)	\s+	" "
Non-whitespace	\S+	"hello"
Any one of a, b, or c	[aeiou]	"a", "e", "i"
Any character NOT a, b, or c	[^aeiou]	"b", "c", "d"
Any character in range a to z	[a-z]+	"hello"
Alphanumeric character	[a-zA-Z0-9]+	"Hello123"

Quantifiers

Description	Example	Matches
0 or more (greedy)	ab*	"a", "ab", "abbb"
1 or more (greedy)	ab+	"ab", "abbb"
0 or 1 (optional)	colou?r	"color", "colour"
Exactly n times	\d{4}	"2024"
n or more times	\d{2,}	"12", "1234"
Between n and m times	\d{2,4}	"12", "123", "1234"
0 or more (lazy)	<.*?>	First `<tag>` only
1 or more (lazy)	".+?"	First quoted string
0 or 1 (lazy)	colou??r	Prefers "color"

Groups & References

Description	Example	Matches
Capturing group	(foo)+	Captures "foo"
Non-capturing group	(?:foo)+	Groups without capture
Named capturing group	(?P<year>\d{4})	Captured as "year"
Named capturing group (JS/PCRE)	(?<year>\d{4})	Captured as "year"
Backreference to group 1	(\w+) \1	"hello hello"
Named backreference	\k<year>	Refers to named group
Alternation — match a or b	cat\|dog	"cat" or "dog"

Lookaheads & Lookbehinds

Description	Example	Matches
Positive lookahead — followed by abc	\d+(?= dollars)	"100" in "100 dollars"
Negative lookahead — NOT followed by abc	\d+(?! dollars)	"100" not before " dollars"
Positive lookbehind — preceded by abc	(?<=\$)\d+	"100" in "$100"
Negative lookbehind — NOT preceded by abc	(?<!\$)\d+	"100" not after "$"

Flags / Modifiers

Pattern	Description	Example
	Case-insensitive matching	/hello/i matches "Hello"
	Global — find all matches, not just first	/\d+/g finds all numbers
	Multiline — `^`/`$` match line starts/ends	/^\w+/m
	Dotall — `.` matches newline too	/hello.world/s
	Extended — allows whitespace and comments	PCRE/Python only
	Unicode mode	/\p{L}+/u

Escape Sequences

Pattern	Description
	Newline (LF)
	Carriage return (CR)
	Horizontal tab
	Vertical tab
	Form feed
	Null character
	Unicode code point (JS/Java)
	Hex character code
	Literal backslash
	Literal dot (escape any metachar)

Common Patterns

Description	Example	Matches
Email address — basic email validation	—	—
URL — HTTP/HTTPS URL	—	—
IPv4 address — four octets 0–255	—	—
Date (YYYY-MM-DD) — ISO 8601 date	—	—
Time (HH:MM) — 24-hour time	—	—
Hex color — CSS hex color	—	—
Phone (US) — US phone number	—	—
Slug — URL-friendly string	—	—
Username — alphanumeric + underscore	—	—
Strong password — min 8 chars, upper, lower, digit, special	—	—
HTML tag — opening HTML tag	—	—
JWT token — JSON Web Token format	—	—

No patterns match your search.

Frequently Asked Questions

What's the difference between `(abc)` and `(?:abc)`?

Both group a sequence of characters so quantifiers and alternation apply to the whole group, but (abc) is a capturing group — it stores the matched text and makes it available via backreferences (\1) or in your match results. (?:abc) is a non-capturing group — it groups without storing anything, which is slightly faster and keeps your capture group numbering clean when you don't need the matched text.

How do lookaheads and lookbehinds differ from regular matching?

Lookarounds let you assert that a pattern exists (or doesn't) immediately before or after your match, without including it in the match itself. \d+(?= dollars) matches the digits in "100 dollars" but the match result is just "100" — the lookahead checks for " dollars" without consuming it. This is useful for validating context (like currency formatting) without capturing the context as part of your result.

What does `^` mean inside vs. outside square brackets?

Outside brackets, ^ is an anchor meaning "start of string" — ^Hello matches strings that begin with "Hello". Inside a character class [^abc], it negates the class — matching any character that is NOT a, b, or c. Context is everything.

Why does my dot `.` not match newlines?

By default, . matches any character except a newline (\n). Enable dotall mode (the s flag in most engines, or re.DOTALL in Python) to make . match newlines too. Alternatively, use [\s\S] as a portable workaround that works in all engines.

Why does my regex match more or less than I expect?

The most common causes are: using a greedy quantifier where a lazy one was needed, forgetting that . doesn't match newlines by default (use the s/dotall flag if it should), or missing the g/global flag so only the first match is returned. Testing incrementally — building up your pattern piece by piece — is the fastest way to catch these issues.

Is regex the right tool for parsing HTML or JSON?

Generally, no. Regex works well for flat, predictable text patterns, but HTML and JSON are structured, nested formats that regex struggles to parse correctly — especially with deeply nested tags or escaped quotes. Use a proper parser (a DOM library for HTML, JSON.parse or similar for JSON) instead. Regex is fine for simple, targeted extraction from already-parsed structured data, but not for parsing the structure itself.

Are named capture groups supported everywhere?

Named groups ((?<name>...)) are supported in JavaScript (ES2018+), PCRE, Python (with the ?P<name> syntax), Java, and .NET. Older JavaScript engines and some legacy regex flavors don't support them — if you need broad compatibility, use numbered backreferences (\1, \2) instead and refer to groups by position.

What are common mistakes beginners make with regex?

The top three: (1) forgetting to escape metacharacters like . and + when you mean them literally; (2) writing greedy quantifiers when you need lazy ones, causing over-matching; (3) using regex for tasks better handled by a parser — parsing HTML or JSON with regex is notoriously fragile. Use a proper parser for structured formats.

Regex Cheatsheet