QUANTIFIERS

Greedy vs lazy

Quantifiers grab as much as possible. Sometimes that is too much.

Take say "hi" and "bye" and try to match the quoted parts with ".*". You get ONE match: "hi" and "bye" - the .* ran greedily to the last quote in the line. Every quantifier is greedy by default: it maximizes the match, then backtracks only as far as needed.

Two fixes. The lazy quantifier ".*?" stops at the first closing quote - appending ? to any quantifier makes it take as little as possible. Or describe the content precisely: "[^"]*" (any number of non-quote characters between quotes). The negated-set version is usually faster and reads more honestly.

Greediness has a dark side. A quantifier nested inside another quantifier, like (.*)* or (a+)+, can force the engine to try an exponential number of paths before it gives up. On the wrong input that hangs the program - a denial-of-service bug called catastrophic backtracking (ReDoS). Some flavors offer atomic groups or possessive quantifiers to defuse it, but the portable cure is the precise, non-overlapping patterns shown above. OWASP: ReDoS.

PRACTICE - 2 DRILLS 0/2 DONE
DRILL 1/2

Match each quoted string separately, quotes included.

/ /
say "hi" and "bye" now
must match: "\"hi\"" "\"bye\""
a "single" one
must match: "\"single\""
no quotes here
must match nothing
DRILL 2/2- HTML tags

Match each HTML tag separately, angle brackets included.

/ /
<b>hi</b>
must match: "<b>" "</b>"
a <a href="x">link</a>
must match: "<a href=\"x\">" "</a>"
no tags here
must match nothing