chris carter's web log

Home |  Contact |  Admin
 

Damn You Newegg!!

Posted on April 30, 2008

The Ultimate Regular Expression Tutorial

Posted on April 30, 2008

Step one.  Admit to yourself that you know nothing about regular expressions, and that it's more of a black art than anything.

Step two.  google.com.  How to split a single pascal cased word into multiple words.

Step three. Let the cursing begin.

Step four.  Write your own regex to solve your problem, it's just a tool, right?

Step five: More cursing.

Step six: Note the time, several hours have passed and nothing.  This is the f*ck it stage.

Step seven:  Write plain ol' C# in 10 minutes using SnippetCompiler to get the job done.

Step eight: When it comes to work that has to get done for $$, always skip to step seven.

The Super Duper PascalCased Word Splitter

public static IEnumerable<string> SplitPascal(string pascalCasedWord){
	if (String.IsNullOrEmpty(pascalCasedWord))
		yield return null;

	List<char> buffer = new List<char>();
	for(int i=0;i<pascalCasedWord.Length;i++){
		char c = pascalCasedWord[i];
		if (Char.IsUpper(c)){
			if(buffer.Count > 0 && i > 0 && !Char.IsUpper(buffer[buffer.Count-1])){
				yield return new String(buffer.ToArray());
				buffer.Clear();
			}
		}
		buffer.Add(c);
	}
	if(buffer.Count > 0)
		yield return new String(buffer.ToArray());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Usage:

List<string> words = new List<string>(SplitPascal("AgencySystemComponent"));
Debug.Assert(words.Count == 3, "should have 3 words but had " + words.Count);
Debug.Assert(words[0] == "Agency", "first word should be Agency but was " + words[0]);
Debug.Assert(words[1] == "System", "second word should be System but was " + words[1]);
Debug.Assert(words[2] == "Component", "third word should be Component but as " + words[2]);
1
2
3
4
5

Any regular expression advice is appreciated.