A pedant that hangs out in the dark corner-cases of the web.

Thursday, May 22, 2008

Update: Visual Studio's NIH RegEx Syntax

It looks like this topic came up a couple of years ago with VS2005 at Coding Horror, including a Microsoft reply.

Hey Jeff (and assorted follow-up posters),

I'm the lead program manager for the team that owns editing and the find/replace dialog in Visual Studio. Our team agrees with your post :)

It is a very oddball regex syntax, and as best we can tell it comes from Visual C++ 2.0. We did want to add additional support for .NET 2.0-style regular expressions in the Visual Studio 2005 release, but unfortunately due to time pressures it didn't make the final list of features. We were able to make a number of bug fixes to the existing engine though, to give some improvement over VS 2003.

We do keep this on our list of things we want to fix. Ideally at some point we'll actually build in a nifty little extensibility point so you can wire up any regex engine you want for searches.

Thanks for the feedback!

Neil Enns
Lead Program Manager
Microsoft Visual Studio
— Neil on July 14, 2006 11:15 AM

So, take heart! A Microsoft team member agreed about the stupidity of the NIH RegEx syntax in 2006, so I'm sure we'll see this fixed any day now.

What we need, as pointed out by another commenter, is another pattern option: Regular regular expressions.

Instead, the closest we currently have is couple of VS add-ons to replace the Find and Replace dialog that support normal regex syntax, an unfamiliar UI, broken English, and VS 2005, and no express editions.

Tuesday, May 06, 2008

Visual Studio's NIH RegEx Syntax

Here's a quick phrasebook for Visual Studio's NIH RegEx syntax:

VS Editor RegEx Syntax Real RegEx Syntax* Meaning
{} () tagged / captured submatch
() (?:) non-capturing submatch
(?=) lookahead assertion
~() (?!) negative lookahead assertion / prevent match
(?<=) lookbehind assertion
(?<!) negative lookbehind assertion
(?>) nonbacktracking (greedy) subexpression
< \< start of word
> \> end of word
\< < matches < character
\> > matches > character
(<|>) \b word boundary
~(<|>) \B not a word boundary
? zero-or-one quantifier
?? minimal zero-or-one quantifier
@ *? minimal zero-or-more quantifier
\@ @ matches @ character
# +? minimal one-or-more quantifier
\# # matches # character
^n {n} match n times quantifier
\^ ^ matches ^ character
{n,} match at least n times quantifier
{m,n} match between m and n times (inclusive) quantifier
{n,}? minimally match at least n times quantifier
{m,n}? minimally match between m and n times (inclusive) quantifier
\(w,n) (replacement expression) left-pad captured group n to w characters
\(-w,n) (replacement expression) right-pad captured group n to w characters
\g \a alert / bell
\h [\b] backspace
\: : matches : character
:i ([a-zA-Z_$][a-zA-Z0-9_$]*) identifier
:q (("[^"]*")|('[^']*')) quoted string
:h ([0-9A-Fa-f]+) hexadecimal number (not including any prefix, e.g. 0x or \x or \u)
:n ((\d+.\d*)|(\d*.\d+)|(\d+)) rational number
:w (\p{L}+) letters
:b [ \t] space or tab (like \s without \n or \v)
:z \d+ integer (one or more decimal digits)
:a \w word / alphanumeric character
[^:a] \W non-word / non-alphanumeric character
:c \p{L} letter character (like \w without the _)
:d \d decimal digit
[^:d] \D non-decimal-digit character
:U \p{U} matches Unicode character category U
\p{IsBlock} matches characters in Unicode named block Block
[^:U] \P{U} does not match Unicode character category U
\P{IsBlock} does not match characters in Unicode named block Block
:Al \p{L} letter
:Nu \d decimal digit
:Pu \p{P} punctuation character
:Wh \s whitespace character
[^:Wh] \S non-whitespace character
:Bi ? bidirectional character
:Ha \p{IsHangulJamo} Korean Hangul and combining Jamos
:Hi \p{IsHiragana} hiragana character
:Ka \p{IsKatakana} katakana character
:Id ? ideographic characters, such as Han and kanji

*This is the syntax supported by everything else, including the .NET System.Text.RegularExpressions library. For some reason, Microsoft decided to create a new syntax just for the Visual Studio editor! But they just don't have the resources to implement "your favorite standard".

Anything that's not on the list should be the same for NIH Regex patterns.