Geeks With Blogs
Joaquin Jares Just another geek's blog

3 years ago, I wrote this post: http://regexlib.com/REDetails.aspx?regexp_id=1135.

It contained a .Net regular expression that would match the highest level of matching braces, something that is needed more often than not when parsing languages. It will also return null if no matching braces are found. Finally, as an added bonus, it will ignore braces that are escaped with a backslash.

Soon after I posted it, someone commented on it being impossible, because Finite Automatas can’t count. I didn’t want to fight it (and I didn’t know better), so I simply made a wrong statement that would divert the discussion. I simply knew that it worked back then. I’m wiser now, and I know the answer.

FAs are used to parse regular languages and as such, they can’t count. That’s absolutely true. That means that you can’t keep track of what you’ve seen by using a FA. So, if you need to match a symmetrical phrase (aabbccbbaa), you can’t. That’s what Push-down automatas are used for.

Problem is, .Net regular expressions don’t work as a regular language. When you start using backtracking and lookahead, you’re automatically out of the regular language domain. And then, Finite Automatas can’t be used anymore.

So, if you need to count, you can always use a backtrack. You can use them to search for a palindrome (http://aspn.activestate.com/ASPN/Cookbook/Rx/Recipe/326097) and you can use them to search for matching braces.

As an added bonus, here’s how you do different braces with the regex:

Curly ({}): (?<!\\)\{(\\\{|\\\}|[^\{\}]|(?<!\\)\{.*(?<!\\)\})*(?<!\\)\}

Parenthesis (()): (?<!\\)\((\\\(|\\\)|[^\(\)]|(?<!\\)\(.*(?<!\\)\))*(?<!\\)\)

Square ([]): (?<!\\)\[(\\\[|\\\]|[^\[\]]|(?<!\\)\[.*(?<!\\)\])*(?<!\\)\]

Chevrons (<>): (?<!\\)\<(\\\<|\\\>|[^\<\>]|(?<!\\)\<.*(?<!\\)\>)*(?<!\\)\>

Hope you find it useful!

Posted on Thursday, July 24, 2008 1:07 AM Language | Back to top


Comments on this post: Matching Braces with Regex

# re: Matching Braces with Regex
Requesting Gravatar...
I've just tested your regex pattern (the square braces specifically), using PHP's preg_match_all. The regex does catch alot, but it's actually too greedy, as it returned unbalanced results for text that had multiple levels of braces.

I'm still looking into what piece of the regex is off, but just FYI in case you can find it first.

If you contact me directly via email, I'll give you my test case.

Additionally, you misspelled "Matching" in your article title (which alone, almost made me skip your article entirely in Google's results).

Cheers!
Left by Jon Langevin on May 12, 2009 1:50 AM

# re: Matching Braces with Regex
Requesting Gravatar...
Thanks for the title thing... I haven't noticed. I corrected it. Also, there's a better way of doing this in .Net, that doesn't really apply to PHP. I'll contact you for the test case via mail.
Left by Osno on May 12, 2009 2:04 AM

# re: Matching Braces with Regex
Requesting Gravatar...
What if the input string contains backslashes?
Left by Brian Keller on Aug 26, 2009 11:16 PM

# re: Matching Braces with Regex
Requesting Gravatar...
If they're used as escapes, that's covered in the sample. If not, then you probable should remove all the occurrences of this: (?<!\\), as that means "not prepended by a slash".
Left by Osno on Aug 27, 2009 6:22 AM

Your comment:
 (will show your gravatar)


Copyright © Joaquin Jares | Powered by: GeeksWithBlogs.net