Regular Expressions and Square Brackets

Square brackets ( [ and ] ) in regular expressions can be confusing if you don't understand how they work as reserved characters. They are reserved characters twice, in two completely different ways:

1) As a set of characters to match, as in [A-Za-z] matches all upper and lower case characters a-z
2) Used for negation, as in [^x] matches NOT x. The carrot, ^, in square brackets, means "not".

This is what makes questions like this tricky:


Hi guys,

Hoping someone can help me out with this

I have a string "Setup.exe [sometext] [someOtherText]"

I need to be able to extract the the text between the [ and ]
Also I'm never going to know how many " [ ] " Im going to have Tongue Tied

Would anyone be able to help me out with this?
Thanks in advance
Here's the answer:

Since square brackets are reserved for these purposes, you'll need to escape them. Note that this can vary from regexp engine to regexp engine. In C# try this.
 
using System;
 using System.Text.RegularExpressions;

 class Test{
         public static void Main(){
                string s = "Setup.exe [sometext] [someothertext]   [and more text here]";
                string pattern = @"\[(?<myMatch>([^\]])*)\]";   

                Regex r = new Regex(pattern);
                MatchCollection m = r.Matches(s);

                foreach(Match match in m){ 
                         Console.WriteLine(match.Groups["myMatch"]);
                }   
         }
 }


Let's translate the regular expression \[(?<myMatch>([^\]])*)\]

First, the Groups syntax. In C# regular expressions, you can give your matches a name. This is a .NET regex thing, so don't expect it if you're using other regex engines. You "name" your regexps like this:
(?<variableName>(REGEX GOES HERE))
You don't have to do it this way, and you can just group your regular expressions with ( ) and index into them. Your pick.

- We start matching at a square bracket. Because square brackets are reserved characters (see #1 and #2 above), we'll escape it with back slash to say "No really, I mean a square bracket, not a reserved character. Look, I'm escaping it to say I mean it!" This is the beginning "\[".
- Next, we use the (?<myMatch> syntax as a convenience.
- Now we get to the heart of it: ([^\]])*). This says: keep going until you hit a closing square bracket. This is use of square brackets #2 above: negation. The * means "any number of those". For example, if you have the regexp [^a]* and the string "bobcat" you'd get "bobc". Make sense? I think of it as "everything up to, but not including".
- Finally, end the madness with a square bracket. Again: no, really, I'm escapging it with a "\" to say I REALLY DO mean the character "]"


Categories

,
| Comments (0)TrackBacks (0)

0 TrackBacks

Listed below are links to blogs that reference this entry: Regular Expressions and Square Brackets.

TrackBack URL for this entry: http://www.rootsilver.com/mt-tb.cgi/59

Leave a comment