Square brackets (
[ and
] ) in regular expressions can be confusing if you don't understand how they work as reserved characters. They are reserved characters twice, in two completely different ways:
1) As a set of characters to match, as in [A-Za-z] matches all upper and lower case characters a-z
2) Used for negation, as in [^x] matches NOT x. The carrot, ^, in square brackets, means "not".
This is what makes
questions like this tricky:
Hi guys,
Hoping someone can help me out with this
I have a string "Setup.exe [sometext] [someOtherText]"
I need to be able to extract the the text between the [ and ]
Also I'm never going to know how many " [ ] " Im going to have 
Would anyone be able to help me out with this?
Thanks in advance
Here's the answer:Since square brackets
are reserved for these purposes, you'll need to escape them. Note that
this can vary from regexp engine to regexp engine. In C# try this.
using System;
using System.Text.RegularExpressions;
class Test{
public static void Main(){
string s = "Setup.exe [sometext] [someothertext] [and more text here]";
string pattern = @"\[(?<myMatch>([^\]])*)\]";
Regex r = new Regex(pattern);
MatchCollection m = r.Matches(s);
foreach(Match match in m){
Console.WriteLine(match.Groups["myMatch"]);
}
}
}
Let's translate the regular expression \[(?<myMatch>([^\]])*)\]
First,
the Groups syntax. In C# regular expressions, you can give your matches
a name. This is a .NET regex thing, so don't expect it if you're using
other regex engines. You "name" your regexps like this:
(?<variableName>(REGEX GOES HERE))
You don't have to do it this way, and you can just group your regular expressions with ( ) and index into them. Your pick.
-
We start matching at a square bracket. Because square brackets are
reserved characters (see #1 and #2 above), we'll escape it with back
slash to say "No really, I mean a square bracket, not a reserved
character. Look, I'm escaping it to say I mean it!" This is the
beginning "\[".
- Next, we use the (?<myMatch> syntax as a convenience.
-
Now we get to the heart of it: ([^\]])*). This says: keep going until
you hit a closing square bracket. This is use of square brackets #2
above: negation. The * means "any number of those". For example, if you
have the regexp [^a]* and the string "bobcat" you'd get "bobc". Make
sense? I think of it as "everything up to, but not including".
-
Finally, end the madness with a square bracket. Again: no, really, I'm
escapging it with a "\" to say I REALLY DO mean the character "]"
Leave a comment