23 October 2007

Some Useful Regular Expressions

I've been sifting through a lot of strings lately and experimenting with the C#/.Net RegularExpressions name space. I have been extracting the string values from PDF files and then using the strings to make database calls.

The challenge has been to clean up the strings as the PDF software I am using doesn't do any cleanup for me when transporting the string back to C#.

The expressions I have been using are as follows*:

Regular Expression: @"^[!""#$%&'()*+,-./:;?@[\\\]_`{|}~0-9]+"
Purpose: Remove non-alpha characters from the start of a string.

Regular Expression: @"[!""#$%&'()*+,-./:;?@[\\\]_`{|}~0-9]+$"
Purpose: Remove non-alpha characters from the end of a string.

Regular Expression: @"[^ -~’]"
Purpose: Remove unprintable characters from string.

Regular Expressions: "[^0-9]"
Purpose: Remove non-numerical characters from a string. (via AnimaOnline)


For a useful Regular Expression tool, check out the RegExLib.com site that has a convenient .Net tester.

* @ in front of string to mark it as a literal