Back to Reference

Regex Cheatsheet

A complete regular expressions cheatsheet covering anchors, character classes, quantifiers, groups, lookarounds, flags, escape sequences, and common patterns like email, URL, and IPv4 validation.

Try it

Regex Cheatsheet

Regular expressions (regex) are a compact syntax for matching, validating, and extracting patterns in text. They're supported — with minor syntax differences — across nearly every programming language, from JavaScript and Python to PHP and Java. This cheatsheet covers the core building blocks you'll reach for most often: anchors, character classes, quantifiers, groups, lookarounds, flags, and escape sequences, plus a set of ready-to-use patterns for common validation tasks.

How to Read This Cheatsheet

Each entry lists the pattern itself, what it describes, an example showing it in context, and what it matches in a sample string. Click any pattern to copy it to your clipboard. Patterns are grouped by category — use the jump links above the tables to skip to the section you need.

Flavors and Compatibility

Most patterns here work across PCRE (PHP, Perl), JavaScript, Python's re module, and Java's Pattern class. A few features — like the x (extended) flag or \A/\Z anchors — are only available in some flavors. When in doubt, test your pattern directly in the Regex Tester before relying on it in production.

Greedy vs. Lazy Matching

By default, quantifiers like *, +, and {n,m} are greedy — they match as much as possible. Add ? after any quantifier to make it lazy, matching as little as possible.

For example, given the string <b>bold</b> and <i>italic</i>:

  • <.*> (greedy) matches the entire string from first < to last >
  • <.*?> (lazy) matches <b> only

This is one of the most common sources of unexpected regex behaviour.


Anchors

Pattern Description Example Matches

Start of string (or line with m flag)

^Hello "Hello" at start

End of string (or line with m flag)

world$ "world" at end

Word boundary

\bcat\b "cat" but not "catch"

Not a word boundary

\Bcat\B "cat" inside "concatenate"

Start of string (PCRE/Python, no multiline)

\AHello "Hello" at very start

End of string (PCRE/Python, no multiline)

world\Z "world" at very end

Character Classes

Pattern Description Example Matches

Any character except newline

c.t "cat", "cut", "c4t"

Any digit (0–9)

\d+ "123" in "abc123"

Any non-digit

\D+ "abc"

Word character (a-z, A-Z, 0-9, _)

\w+ "hello_123"

Non-word character

\W+ " !@"

Whitespace (space, tab, newline, etc.)

\s+ " "

Non-whitespace

\S+ "hello"

Any one of a, b, or c

[aeiou] "a", "e", "i"

Any character NOT a, b, or c

[^aeiou] "b", "c", "d"

Any character in range a to z

[a-z]+ "hello"

Alphanumeric character

[a-zA-Z0-9]+ "Hello123"

Quantifiers

Pattern Description Example Matches

0 or more (greedy)

ab* "a", "ab", "abbb"

1 or more (greedy)

ab+ "ab", "abbb"

0 or 1 (optional)

colou?r "color", "colour"

Exactly n times

\d{4} "2024"

n or more times

\d{2,} "12", "1234"

Between n and m times

\d{2,4} "12", "123", "1234"

0 or more (lazy)

<.*?> First `<tag>` only

1 or more (lazy)

".+?" First quoted string

0 or 1 (lazy)

colou??r Prefers "color"

Groups & References

Pattern Description Example Matches

Capturing group

(foo)+ Captures "foo"

Non-capturing group

(?:foo)+ Groups without capture

Named capturing group

(?P<year>\d{4}) Captured as "year"

Named capturing group (JS/PCRE)

(?<year>\d{4}) Captured as "year"

Backreference to group 1

(\w+) \1 "hello hello"

Named backreference

\k<year> Refers to named group

Alternation — match a or b

cat|dog "cat" or "dog"

Lookaheads & Lookbehinds

Pattern Description Example Matches

Positive lookahead — followed by abc

\d+(?= dollars) "100" in "100 dollars"

Negative lookahead — NOT followed by abc

\d+(?! dollars) "100" not before " dollars"

Positive lookbehind — preceded by abc

(?<=\$)\d+ "100" in "$100"

Negative lookbehind — NOT preceded by abc

(?<!\$)\d+ "100" not after "$"

Flags / Modifiers

Pattern Description Example

Case-insensitive matching

/hello/i matches "Hello"

Global — find all matches, not just first

/\d+/g finds all numbers

Multiline — ^/$ match line starts/ends

/^\w+/m

Dotall — . matches newline too

/hello.world/s

Extended — allows whitespace and comments

PCRE/Python only

Unicode mode

/\p{L}+/u

Escape Sequences

Pattern Description

Newline (LF)

Carriage return (CR)

Horizontal tab

Vertical tab

Form feed

Null character

Unicode code point (JS/Java)

Hex character code

Literal backslash

Literal dot (escape any metachar)

Common Patterns

Pattern Description Example Matches

Email address — basic email validation

URL — HTTP/HTTPS URL

IPv4 address — four octets 0–255

Date (YYYY-MM-DD) — ISO 8601 date

Time (HH:MM) — 24-hour time

Hex color — CSS hex color

Phone (US) — US phone number

Slug — URL-friendly string

Username — alphanumeric + underscore

Strong password — min 8 chars, upper, lower, digit, special

HTML tag — opening HTML tag

JWT token — JSON Web Token format

No patterns match your search.

Frequently Asked Questions

What's the difference between (abc) and (?:abc)?

Both group a sequence of characters so quantifiers and alternation apply to the whole group, but (abc) is a capturing group — it stores the matched text and makes it available via backreferences (\1) or in your match results. (?:abc) is a non-capturing group — it groups without storing anything, which is slightly faster and keeps your capture group numbering clean when you don't need the matched text.

How do lookaheads and lookbehinds differ from regular matching?

Lookarounds let you assert that a pattern exists (or doesn't) immediately before or after your match, without including it in the match itself. \d+(?= dollars) matches the digits in "100 dollars" but the match result is just "100" — the lookahead checks for " dollars" without consuming it. This is useful for validating context (like currency formatting) without capturing the context as part of your result.

What does ^ mean inside vs. outside square brackets?

Outside brackets, ^ is an anchor meaning "start of string" — ^Hello matches strings that begin with "Hello". Inside a character class [^abc], it negates the class — matching any character that is NOT a, b, or c. Context is everything.

Why does my dot . not match newlines?

By default, . matches any character except a newline (\n). Enable dotall mode (the s flag in most engines, or re.DOTALL in Python) to make . match newlines too. Alternatively, use [\s\S] as a portable workaround that works in all engines.

Why does my regex match more or less than I expect?

The most common causes are: using a greedy quantifier where a lazy one was needed, forgetting that . doesn't match newlines by default (use the s/dotall flag if it should), or missing the g/global flag so only the first match is returned. Testing incrementally — building up your pattern piece by piece — is the fastest way to catch these issues.

Is regex the right tool for parsing HTML or JSON?

Generally, no. Regex works well for flat, predictable text patterns, but HTML and JSON are structured, nested formats that regex struggles to parse correctly — especially with deeply nested tags or escaped quotes. Use a proper parser (a DOM library for HTML, JSON.parse or similar for JSON) instead. Regex is fine for simple, targeted extraction from already-parsed structured data, but not for parsing the structure itself.

Are named capture groups supported everywhere?

Named groups ((?<name>...)) are supported in JavaScript (ES2018+), PCRE, Python (with the ?P<name> syntax), Java, and .NET. Older JavaScript engines and some legacy regex flavors don't support them — if you need broad compatibility, use numbered backreferences (\1, \2) instead and refer to groups by position.

What are common mistakes beginners make with regex?

The top three: (1) forgetting to escape metacharacters like . and + when you mean them literally; (2) writing greedy quantifiers when you need lazy ones, causing over-matching; (3) using regex for tasks better handled by a parser — parsing HTML or JSON with regex is notoriously fragile. Use a proper parser for structured formats.