chris carter's web log

Home |  Contact |  Admin
 

The Ultimate Regular Expression Tutorial

Posted on April 30, 2008

Step one.  Admit to yourself that you know nothing about regular expressions, and that it's more of a black art than anything.

Step two.  google.com.  How to split a single pascal cased word into multiple words.

Step three. Let the cursing begin.

Step four.  Write your own regex to solve your problem, it's just a tool, right?

Step five: More cursing.

Step six: Note the time, several hours have passed and nothing.  This is the f*ck it stage.

Step seven:  Write plain ol' C# in 10 minutes using SnippetCompiler to get the job done.

Step eight: When it comes to work that has to get done for $$, always skip to step seven.

The Super Duper PascalCased Word Splitter

public static IEnumerable<string> SplitPascal(string pascalCasedWord){
	if (String.IsNullOrEmpty(pascalCasedWord))
		yield return null;

	List<char> buffer = new List<char>();
	for(int i=0;i<pascalCasedWord.Length;i++){
		char c = pascalCasedWord[i];
		if (Char.IsUpper(c)){
			if(buffer.Count > 0 && i > 0 && !Char.IsUpper(buffer[buffer.Count-1])){
				yield return new String(buffer.ToArray());
				buffer.Clear();
			}
		}
		buffer.Add(c);
	}
	if(buffer.Count > 0)
		yield return new String(buffer.ToArray());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Usage:

List<string> words = new List<string>(SplitPascal("AgencySystemComponent"));
Debug.Assert(words.Count == 3, "should have 3 words but had " + words.Count);
Debug.Assert(words[0] == "Agency", "first word should be Agency but was " + words[0]);
Debug.Assert(words[1] == "System", "second word should be System but was " + words[1]);
Debug.Assert(words[2] == "Component", "third word should be Component but as " + words[2]);
1
2
3
4
5

Any regular expression advice is appreciated.

Comments

Will Asrari

lol. this post reminds me of when I posted about developing on an XP Home Premium machine. I still wonder why there are at least a billion different variations of the valid e-mail address validation expression.

Post a Comment

(required)
(required)
(no HTML!)